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Prefi 


ace 


The present book comprises a 
collection of hitherto unpublished papers which were read in Baton 
Rouge at a Louisiana State University Psychology Symposium, Feb- 
ruary 1958, the aim of which was to present the most recent concepts 
and developments in objective personality assessment. Accordingly, 
nationally known experts from virtually every section of the country 
were invited to read papers concerned with their current thinking 
about problems of objective personality measurement, together with 
a description of some of their present research and its background. 

The final result is considered to be an accurate representation of 
what leaders in the field of objective personality testing are now 
doing. This volume tells what many of the personality tests of 
tomorrow will look like and what the rationale behind them will be. 
There was, of course, no general agreement about personality meas- 
urement among the authors. About the only point of real agreement 
was that personality testing should be objective. Even here, the 
historian of the symposium, Robert I. Watson, a clinician of note 
though not a dedicated objective measurement psychologist, was 
moved to protest when the clinical usefulness of the Rorschach and 
other projective devices was seriously questioned during one of the 
discussion periods. 

Each of the contributors has developed a definite approach to 
measuring one or more facets of personality; and he acknowledges, 
at least by implication, a serene confidence in his methodology. He 
sees room for improvement, of course, in what lie is doing and he is 
willing to discuss it. He sees even greater room for improvement in 
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what his colleagues are doing, at least as they revealed their pro- 
fessional activities at the symposium. 

It is impossible to capture in print the total atmosphere or a 
symposium where oral exchanges are many, where gestures and ex- 
pressions often convey as much meaning as the words used. Yet, if 
one were to describe the general flavor of the meetings, it would be 
that one distinctly felt that the speakers knew what they were doing 
and that they enjoyed doing it. This was reflected in the willingness, 
almost eagerness of each speaker, to reply to all questions raised by 
members of a sizeable audience and to answer without cavil. 

Thus far, the present editors have been referring to the invited 
speakers who assembled to present their papers. Each of the 
speakers, it is true, exerted some form of catalytic effect upon the 
others; yet there was more. There was the audience, of course, and 
there were several people who were invited because the editors, in 
planning the symposium, believed these persons would contribute 
directly to the atmosphere of scholarly enthusiasm. There was, for 
instance, H. Max Houtchens of the Veterans Administration Central 
Office in Washington. He has a trick of quietly asking some of the 
most pointed questions in the kindliest way imaginable and serves, 
thereby, as a first-order clarifier of cloudy issues. 

Out of the informal interchange several points appeared. The 
speakers had nothing against projective tests as such but they liked 
objective tests better, at least, as they conceived of them. As may 
be noted in the first chapter, the conceptions of objective vary some- 
what, even among a group of experts such as the authors of this 
book. But objective in their fashion, the speakers at the symposium 
liked objective tests for reasons which may be summarized as fol- 
lows: 

1. They are usually easier to administer (often requiring little 
training of test examiners). 

2. They are more easily evaluated. It is usually much easier to 
gauge the reliability and validity of objective than subjective tests. 
It is thus easier to discover the errors in one’s measurements which 
is less often true in the case of subjective assessment 

3. Objective testing is more likely to contribute to constructing a 
body of theory and generalization about human behavior. If there is 
to be a true science of personality,” a body of integrated constructs 
anchored by operational definitions to observables, the observables 
will have to be measured objectively. 

. Thc P r “ ent symposium is part of the graduate training program 
in psychology at Louisiana State University. It would have been 
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much smaller in scope were it not for the broad professional vision 
of the administrators of the Louisiana State Department of Hospi- 
tals. This state agency supplies funds for training and research in 
psychiatry, psychiatric social work, psychiatric nursing, and clinical 
psychology, with provision for visiting lecturers being typically in- 
cluded in such grants. This is one product of the firm conviction of 
the Department of Hospital administrators that those who serve 
the State should have the best training possible. Therefore it is a 
pleasure to thank Mr. Jesse H. Bankston, Director of the Department 
of Hospitals, and liis program directors, Mr. Winbom E. Davis and 
Mr. E. R. Rogillio, for their encouragement and for their uncom- 
promising stand on quality in professional training. The present 
symposium is but one product of their far-sightedness. 

Manuscript preparation is always a chore and sometimes irksome. 
The present task was less irksome than usual because Mr. Arthur 
Kaufman checked the references, while Mrs. Vera Foil and Mrs. 
Floy Brown did the final typing of the manuscript. Mrs. Sylvia Berg 
is to be thanked heartily for her work with the Author Index; Mrs. 
June T. Bradford for her completion of the Subject Index. 

Bernard M. Bass 

Irwin A. Berg 


Baton Rouge, Louisiana 
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Historical Review of Objective 
Personality Testing: The Search 
for Objectivity 


Robert I Watson 

Northwestern University 


X he noted philosopher of science, 
Herbert Feigl, considers a major standard of science to be mter- 
subjective testability. Concerning intersubjective testability he 
mites: 


This is only a more adequate formulation of what is generally meant by the 
“objectivity’ of science What is here involved is not only the freedom from 
personal or cultural bias or partiality, but— even more fundamentally— the re 
quirement that the knowledge claims of science be in principle capable of test 
(confirmation or disconfirmation at least indirectly and to some degree) on the 
part of any person properly equipped with intelligence and the technical de 
vices of observation or experimentation The term intersubjective stresses the 
social nature of the scientific enterprise If there be any 4 truths that are acces 
sible only to privileged individuals, such as mystics or visionaries— that is 
knowledge claims which by their very nature cannot independently be checked 
by anyone else— then such truths ’ are not of the kind that we see in © 
sciences The criterion of intersubjective testability thus delimits the scientific 
from the nonscientific activities of man (15, p 11) 


I would add -that objectivity is a goal of science, not a prerequisite 
for scientific endeavors Objectivity is not absolute, but relative It 
is not unusual in science for basic phenomena to be first described 
in a qualitative way Objective methods emerge only upon more 
intensive study Our efforts are m the direction of increasing ob- 

X 
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activity whenever possible, but this does not mean that we can 
neglect problems simply because they are not yet objective A 
least some of us select problems which we feel are capable or beuig 
rendered more objective and make our research task this search for 
increasing objectivity For example, in Chapter 9 by Hunt, we 
shall find that a demonstration of the reliability of clinical judgment 
is a means whereby the clinician, himself, becomes a more objective 
instrument In the present chapter, a major theme is the search for 
objectivity m personality testing 

Of necessity, my topic must be considered m a somewhat nar- 
rower framework than the entire scope of personality theory A 
major omission in the consideration of objectivity in personality 
evaluation is the argument as to why one should go beyond the 
objective approach, as advanced by philosophical and phenomeno- 
logical characterologists This, otherwise, serious omission is tern 
pered somewhat by the fact that the characterologists have not been 
particularly interested in psychological testing, despite the fact that 
many projective tests can be shown to be interpretable on phe- 
nomenological principles 

In order to place objective personality testing m historical per- 
spective, it is necessary to say something about objective psycho- 
logical testing in general. We cannot ignore early mental testing in 
our search for the beginnings of personality testing, for to do so 
would be to ignore a truism of historical research— that the begin- 
nings of attention to a topic may not be referred to m the same 
manner as it is referred to in later y ears Personality, as our various 
conceptions now regard it, was not a systematic rubric m the earlier 
psychologcal traditions Lack of specific reference, however, does 
not prevent us from seeing in the perspective of the present-day, 
some of the aspects of what we now call personality So we begin 
with the history of testing, not the history of personality testing, 
narrowly regarded. J 


Becdtstscs of Objective Testing 

in about of . ob J ecU 'f ps> chological testing begins, 
o'n Fnr ’ !C V,as 5 ‘ l1 - the subject matter of ps) eliol- 

SL, PU T JSC \'. V '° tum to a classification of tests 

. “ 10 Potion, Manual of Menial and 

7, TC ' U (37 > t«ts served the purpose of deteroun- 

“ tAtasr.pbkal refae^ a the cad of oi 
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ing and measuring some phase of mental capacity or trait I would 
like to add parenthetically that even in 1910 he could plead that 
what was needed was not new tests, but an exhaustive investigation 
of those already available But to return to lus classification of 
mental tests, the major heading (disregarding the anthropometric) 
were physical and motor capacity, sensory capacity, attention and 
perception, description and report, association learning and memory, 
suggestibility, imagination and invention, and intelligence Note, 
there is no mention of personality However, the tests of suggesti- 
bility and imagination and invention could be called personality 
tests in today’s perspective It would also be possible to include 
description and report in the scope of personality Note also that 
tests for emotion were not mentioned, for measures of emotion came 
later Many of the personality questionnaires of the twenties were 
called measures of emotionality So, too, were more objective 
efforts, such as the X-O, or cross out tests of Pressey, m which the 
number of words found to be unpleasant was the affectivity score 
With this justification of what is included in the discussion to 
follow, we may now turn to the history of mental tests 

Sir Francis Galton shares with James McKeen Cattell, the 
founding of psychological testing As early as 1882, Galton had 
established a small laboratory in London where, for a small fee, 
individuals could take a series of physical measurements and tests 
of reaction time and sensory acuity (One might ask in passmg 
whether this payment meant that he was the first psychological 
practitioner ) The very fact that he thought people would be in- 
terested in their standing on these measures shows their test orienta- 
tion 

Galton was primarily interested in no less than an inventory of 
human abilities He related these to his evolutionary views and to 
his studies of mheritance, but the fact remains, that he conceived 
of his various measures as tapping as broad a spectrum of psycho 
logical characteristics as was possible If the term personality, had 
been used as it is now, I believe he would not have hesitated to use 
it to describe some of his efforts 

In a paper published in 1890, m which he corned the term, mental 
tests Cattell proposed a standard series of tests to be applied for 
“discovering the constancy of mental processes, their interdepend 
ence, and their variation under different circumstances (6, p 373) 

He offered both a select list of ten tests then being used in the Psy- 
chological Laboratory of the University of Pennsylvania, and a 
longer list of 50 others proposed for further consideration The ten 
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tests were dynamometer pressure, rate of movement, two-point 
threshold, pain sensitivity, least noticeable difference in weight, 
reaction time for sound, time for naming colors, bisection of a 
50 cm. line, judgment of ten seconds time, and those numbers of 
letters repeated on once hearing The list of 50 was essentially 
similar The fact tha* Galton (6) contributed a number of com- 
ments at the end of this article gives unequivocal evidence of the 
connection between Gallon’s interest in individual differences and 
the mental test movement 

Earher, m 1883, with his already formed interests in individual 
differences and in reaction time as a measure of intelligence, Cattell 
had gone to Wundt’s laboratory Here he had completed his doc- 
toral dissertation on his own problem of individual differences in 
reaction times, which Wundt, it might be added, had viewed dubi- 
ously as a suitable problem for a psychologist to undertake 
After CattelTs sojourn at the University of Pennsylvania, he 
moved to Columbia University, where he continued his testing pro- 
gram with essentially the same battery of tests After several years’ 
data had been collected, a monograph by Clark Wissler (39) ap- 
peared m 1901 reporting the findings Correlation between results 
from the various tests and academic class standing w r as negligible 
Moreover, the tests were no more intercorrelated among themselves 
than they were related to class standing This was m sharp contrast 
to the substantial correlations found between standing on the var- 
ious college subjects These disappointing results, plus another 
negative trial made m Titchener’s laboratory by Sharp (29), probably 
did much to make psychologists lose interest m the topic Certainly, 
other students of Titchener and Cattell did not follow up these 
matters with laboratory devices These two men were training the 
majority of psychologists who did not receive their training m 
urope On the continent Wundt’s emphasis upon the generalized 
human mind, a view which was shared by most other European 
psychologists, did nothing to encourage further exploration 

lne interest m simple sensory and motor tests during this period 
1 3nd Moratory-bound Interest m individual 
* “ ? cc " tral Measuring devices were single tests, not 
mus^Tind t SC t C !i Th ° D0W 1151131 cbecb on rebabdity were 
SSi‘ St f"t‘ rdl2a 4r , o aSlde f™” a oertam similarity of in- 
is tliat of the lah T1 ’ 6 P enot l of objectne testing, then, 

{5-JSSS “ 

So wc have seen that initial enthusiasm for mental tests in the 
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United States was met by negative results and this particular line 
of development lapsed, in what may be, for convenience, referred 
to as the laboratory period in psychological testing 

Early Lack of Conceilv for Objective Testing 

It is pertinent to pause to consider psychologists’ views m relation 
to tins search for objectivity It is probable that the question of 
objectivity did not concern them because of the origins of test 
materials in the laboratory Reaction time devices measure reaction 
time, learning nonsense syllables is learning The process measured 
was defined by’ the material, just as in the laboratory today one does 
not ash, ‘Arc we really measuring learning?” when serial learning 
lists are exposed or maze paths are threaded These measures had 
what we now would call content validity Content validity is the 
degree to which the test samples the universe of content specified, 
as in an achievement test and in the usual measures for the ex- 
periments in learning The step from measuring reaction time to 
using it for the measurement of intelligence “because intelligence 
calls for speedy reaction, ’ seemed plausible but of no great theo- 
retical moment It was not then seen that it was a great leap from 
observed behavior to construct 

By and large, the question of objectivity was not verbalized dur 
mg these years Psychology, after all, was still the study of mental 
structures or functions, and introspection the method for advance 
ment of psychological knowledge For example, Whipple, (37) m 
his 1910 authoritative and widely used test manual, does not men- 
tion objectivity He does, however, speak of standardization of 
conditions, which is conducive to objectivity Familiarity with 
instructions and their clarity were also stressed General know/edge 
of the literature and an mspection of some of the textbooks of the 
period, however, did not reveal discussion of objectivity as a topic 
After all, psychologists could not use objective in referring to a 
subjective science But this does not mean they were unaware of 
the problem In fact, the centuries old question of the personal 
equation which, years later, Wundts students and others investi 
gated, is a recognition of precisely this point So too, is the psy- 
chologist s fallacy of James the confusion of the personal standpoint 
with the mental facts Traming of mtrospectors served the same 
function of increasing what we could call objectivity 
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Intelligence Testing 

It is tempting, but not particularly germane to the major issue, to 
turn now to the work of Binet who, after working with similar 
sensory and motor tests with similarly unproductive results, even- 
tually found in more complex or higher mental functions the means 
to measure intelligence. But this is part of the history of intelligence 
testing of 1900-1920. In the United States this history of intelligence 
testing was not closely bound to personality testing, for various 
reasons. Interest in and research on intelligence testing were 
directly related to Binet's efforts but the others who came after him 
did not continue his systematic analytic interests. Those who fol- 
lowed Binet were pragmatic and interested in the application or 
intelligence tests to social matters, such as mental retardation, school 
placement, and the like. But they were so absorbed with their 
instruments of measurement that they were not very much inter- 
ested in problems beyond these instruments. 

As the well-known definition would have it, psychologists of that 
day, and for some years to come, tended to consider intelligence to be 
whatever intelligence tests measured. So, in tills sense, interest was 
in intelligence as a global concept. And yet, what Spearman called 
the anarchic theory of mental structure, a theory of extreme specifi- 
city of mental structure and function, was the prevailing view. The 
studies of William James, Thorndike, and others on transfer of train- 
ing had fostered the view that abilities were highly specific. The 
results of sensory-motor testing, described earlier, had much the 
same effect. Thus, we had the practical pragmatic interests in 
intelligence testing, on the one hand, and, on the other, even larger 
segments of the psychological field in which abilities and traits were 
viewed as highly specific. 

Tliis lack of relevance of developments in the specific areas of 
intelligence testing, during these years in the United States, curi- 
ous y enough, does not seem to have a counterpart in Britain. In a 
sense, the British psychologists continued more closely the tradition 
of the laboratory that has been described. In part, this was due to 
ic impetus of Gallon and a continued stress on the part of his 
m lndl ™ ka! difference. In part, it was due to the statls- 
Undcr Galt ° n “ d 

the'loeical USC< t ® rilam at die turn of the century were 

logical derivatives of earlier sensory and motor tests, but they 
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also included as measures, association tests, such as retention meas- 
ures, target aiming, card-sorting, and the like 

In an important paper published m 1904, Spearman (30) criticized 
the prc\ lous methodological efforts on statistical grounds For ex- 
ample, many earlier workers had failed to use quantitatively precise 
statements of the degree of correlation between tests, they did not 
calculate the probable error, and they did not allow for errors of 
observation In addition, based upon his correlation between sen- 
sor)' tests and estimates of intelligence, Spearman armed at the 
conclusion that “all branches of intellectual activity have in com- 
mon one fundamental function (30, p 284) Thus was launched 
tlie beginnings of the thinking from which, a few years later, came 
factor analysis The British ps) chologists saw more clearly than 
their American contemporaries tlie reasons that early attempts at 
testing had failed 

Above all, they had something positive and challenging to work 
with in factor anal) sis Their general association^ background was 
conducive to continuing this tradition They continued an interest 
m these measures, gradually including more and more material 
relevant to the higher mental processes 

Factor analysis is a tool, by the very nature of which you cannot 
in advance tell what factors will emerge True, the material was so 
selected as to get at intellectual function, but the nature of the 
technique required an analytic att'tude Nor were nonintellectual 
factors entirely neglected The pioneering factor analytic study of 
Webb (36) in 1915 was based on ratings and yielded a factor which 
seemed to be strength of character or will, called w Burt (4), the 
same year, briefly reported on the interrelation of ratings of emo- 
tions But now we must return to developments talang place in die 
United States m 1910’s and 1920 s 


Behaviorism and Objectivity of Measurement 

The appearance of Behaviorism, with its militant espousal of an 
objective approach, had a profound effect on psychological flunking 
For our purposes, it may be dated by tlie appearance of the work 
of John B Watson, beginning in 1913 with his articles and culminat- 
ing in his 1919 publication, Psychology from the Standpoint of a 
Behaviorist Mentahstic terms, including, subjective became 
epithets The Russian reflexology, which came into being in the 
immediately preceding years, is sometimes referred to as, Objective 
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Psychology,” after the book by that name published in 1910 by 
Bekhterev. 

Psychologists then found in objectivity a standard of science. No 
longer did they have to struggle with mediate and immediate ex- 
perience, dependent and independent experience, and the differ- 
ences between the objective science of physics, and the subjective 
science of psychology. They could then use "objectivity” proudly, 
as we do to this very day. In the recently received copy of the 
supplement to the Psychological Review , there was a "Glossary of 
Some Terms Used in the Objective Science of Behavior,” by Ver- 
planck (34), who did not even find it necessary to define objective 
among the many, many terms covered. 

The spirit of the times or the Zeitgeist , a term popularized by 
E. G. Boring, had prepared the way for the appearance of an in- 
terest in performance tests. An interesting example is the Will- 
Temperament Test, a behavior measure, which fitted in with the 


times. In 1919, June Downey (11) introduced a test for the measure- 
ment of what she called will-temperament. Its nature was in- 
triguing, consisting largely as it did of handwriting samples under 
different conditions and, thus, behavioral in nature. A sample of 
writing was obtained at "ordinary” speed (for a baseline), as rapidly 
as possible (to get a comparison with ordinary speed on the theory 
that those writing much slower than they can are subject to a load 
or inhibition), in a different style (to measure flexibility), as slowly 
as possible (to measure motor inhibition or control), and so on. The 
test appealed to the desire for objectivity; so it was met with en- 
thusiasm. It was given trial after trial until about fifty studies were 
performed, despite almost uniformly negative results from the be- 
ginning. It was as if such a behavioral test as this could not fail to 
work just because it was a behavioral approach. The test is also 
^a 0 tiT “ »v C ^ rs ^. ma j or performance measure of personality, 
•t,.. ,?JT;r?icv 5 i e ea , r . ier performance measures of personality was 
^n l of- T m t lP „ Stud >' b y Voelker (35) of “moral reactions to 
f0 0Wcd in 1923 by ‘be Character Education 
and Shfci V^ 1Ch T° as , s , 0 ?‘ ate ‘bo ™mcs of Hartshome and May, 
tm tv l ] n^ . UC w performance tests for honest}', 

tSsTs' nn pfu u f S ’ mhibm ° n . and persistence. The nature 
o hc low cor'T,, kT™ t0 P ausc ° ver ‘hem. Their findings 
ccntuatcan m u ” ^“ific “oasures helped to ac- 

of personality or tint <5 ^ C s ^ c P ticism about tests as measures 

CS I'cU^rr" r T mi t0 “ *o anarchic view. 

m, itself, with its emphasis upon S-R bonds and per- 
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sonality as a bundle of habits, had helped to bring about this 
skepticism concerning tests on the part of many psychologists. The 
results of Hartshome and May, even though behavioral, however, 
were another invitation akin to that furnished by the earlier results 
of Wissler and Sharp, to see testing in a skeptical light. 

Performance tests are logically and chronologically related to 
assessment procedures of the miniature life situation sort. They, too, 
have a long past despite their short history. The earliest statement 
of the potentials of this method is probably that of Galton (19). In 
1884 he wrote: 

Emergencies need not be waited for, they can be extemporized, traps, as it 
were, can be laid. Thus, a great ruler whose word can make or mar a subject's 
fortune, wants a secret agent and tests his character during a smgle interview. 
He contrives by a few minutes’ questioning, temptation, and show of dis- 
pleasure, to turn his character inside out, exciting in turns his hopes, fear, zeal, 
loyalty, ambition, and so forth Ordinary observers who stand on a far lower 
pedestal, cannot hope to excite the same tension and outburst of feeling m 
those whom they examine, but they can obtain good data in a more leisurely 
way. If they are unable to note a man’s conduct under great trials for want of 
opportunity, they may do it m small ones, and it is well that those small occa- 
sions should be such as are of frequent occurrence, that the statistics of mens 
conduct under like conditions may be compared. Alter fixing upon some par- 
ticular class of persons of similar age, sex, and social conditions, we have to 
find out what common incidents m their lives are most apt to make them betray 
their character. We may then take note as often as we can, of what they do 
on these occasions, so as to arrive at their statistics of conduct m a limited 
number of well defined small trials (30, p 182). 

He goes on to offer specific suggestions, such as the following: 

The poetical metaphors of ordinary language suggest many possibilities of 
measurement. Thus when two persons have an “inclination to one another, 
they visibly incline or slope together when sitting side by side, as at a dinner 
table, and they then throw the stress of their weights on the near legs of their 
chairs It does not require much ingenuity to arrange a pressure gauge with 
an index and dial to indicate changes in stress, but it is difficult to devise an 
arrangement that shall fulfill the threefold condition of being effective, not 
attracting notice, and being applicable to ordinary furniture. I made some rude 
experiments, but being busy with other matters, have not earned them on, as 
I had hoped (30, p 184). 

In view of the date in which this was published, 1884, it would 
be possible to argue that this was the first proposal for an objective 
personality measure. 
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Modern Assessment Methods 

Modem assessment procedures, as in the stress interview, the OSS 
procedures, and the Michigan VA trainee study, arc apparently 
moving out of a period in which they were enthusiastically accepted 
and tried, to a period of skepticism about them. If there is to be a 
period of synthesis, it is too early to predict its nature. 


Personality Questionnaire 

I shall now turn to personality questionnaires in this search for 
objectivity. During World War I, Woodworth’s Personal Data 
Sheet , or, as it was later called, the Psychoneurotic Inventory , was 
developed. It contained 116 items derived from descriptions of 
symptoms of neurotic patients (18). “Yes” or "no” responses were 
called for and were scored by simple counting to arrive at a total. 
Because the Armistice intervened, this inventory was not used ex- 
tensively in the military setting. It became, over the years, the 
form to which to turn to select and develop items. 

We have had since that time a phenomenal growth in the de- 
velopment and application of personality questionnaires. Macfar- 
lane (22), in a critique of projective testing, spoke of the rapidity of 
appearance of projective instruments as partaking of the nature of 
a virulent infection. This remark can be applied with equal force 
to the self-report personality questionnaire. I would roughly esti- 
mate from the Buros reviews of mental measurements and other 
sources tiiat either as single short questionnaires or in multiple 
capsule form, at least 500 commercially available personality ques- 
tionnaires have appeared. A tremendous number of psychologists 
(including the present writer) helped to develop these get-knowl- 
edge-quick devices. r ° 

There is no question that personality questionnaires had their 
period of enthusiastic acceptance. Their appeal was to be found in 
their partial objectivity; a score could be derived on which inde- 
coult ! fg 166 ' Scoring, did not then, nor does it now, 
taDoS r 1Ve S f bl “ tive i^gment. Nevertheless, several very 
K subjectivity were present which, at first, were 

nendent ob i ectivit >'- Customarily they are not inde- 

nafie hcl m ° tlVat “ m °j P<™m completing the question- 

Xh™ it rrSf dec f? tion - and I unconscious 

detailed Him ° aaot h^ r subjective factor, however, to which 
detaded illustrative attention will be given. These questionnaires 



THE SEARCH TOR OBJE CTIVITY 


11 


are subjective in that they require interpretation of the meaning of 
the questions ashed by the tester 

Interpretive subjectivity for the person taking them is rampant 
in most personality questionnaires Consider an early study of 
Benton (3) He interviewed subjects after completion of question- 
naire items, as to what they thought was meant by the items He 
found, for example, that the item, ‘Do you take pride in your physical 
appearance? ’ was answered as if the question meant, do you always 
feel proud, sometimes feel proud, are jou always careful, and are 
you sometimes careful of your physical appearance? Similar results 
have been found by others 

Instead of dealing with other more detailed and significant find- 
ings, let me indulge in an anecdote from personal experience The 
psychological interview on the receiving line m a Naval Recruit 
Training Center during World War II partook of the quality of a 
verbally administered personality questionnaire, since sheer press 
of time did not allow using that distinctive characteristic of the 
interview, the follow through probing of replies Enuresis was a 
rather common and disturbing symptom and, consequently, pre- 
cious time was taken to inquire about it An affirmative or a nega- 
tive reply to the question “Do you wet the bed at night? had to be 
checked, since ‘Yes” might mean, “Yes, because fifteen years ago 
at the age of six I had an accident,” while “No,” might mean No, 

I haven’t for two nights m a row” Wording ‘When did you last 
wet the bed?’ was found to increase objectivity in that a better 
understanding of the mtent of the question followed In this pedes 
tnan, minute improvement we can see how objectivity improves 

Sometimes personabty questionnaires are criticized as if ob 
jectivity were an absolute In view of Thurstones work with the 
personabty questionnaire, it is of interest to note that this is the 
position he took. He asserts flatly that such questionnaires are not 
tests m any strict sense since tests are “ objective procedures 
(321, p 353) with the implication that questionnaires are not Une 
may sharply separate tests from questionnaires, as Catte oes> 
without denying questionnaires some objective status o lur 
stone’s position, one may take exception, as I have trie o o m 
arguing that objectivity is a relative matter 

Most personality questionnaires, in my opinion, ave P ro y 
be unsuccessful m their tasks as scientific insUumenfs The mdict 
ments by Elks (12) and Elks and Conrad (13) m large > 
seem justified. One may, however, argue on ce am P » 
as, classifying all questionnaires together, when a re y 
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the particular instrument may show more encouraging results. The 
Minnesota Multiphase Personality Inventory, for example, fared 
considerably better in its reviews than other instruments. 

Certain measures, particularly the MMPI, probably owe their 
continuing use and expanded value, despite the general failure in 
part, to the development of specific means of increasing objectivity. 
The Lie Scale is one such device. In addition to indices of increas- 
ing objectivity with the MMPI, there is the intimately related fact 
that it is a more complete, complex, and intricate instrument than 
many of the other personality measures. 

Psychology is a broad subject and other influences, perhaps 
running counter to the prevailing Zeitgeist (perhaps, representing 
still another trend), which appeared in some measure in the late 
twenties reached a considerably higher peak of visibility in the 
mid-thirties. I refer, of course, to projective testing. 


Projective Techniques 

At first, it might seem that the present concern does not call for 
direct attention to projective techniques. Nevertheless, in our search 
or objectivity some reminder of the beginning of projective testing 
is necessary to place its developing influence upon objectivity. 

Rorschach worked with his inkblots in Switzerland during the 
decade 1910-1920. Studies with the Rorschach Test in the United 
States can be dated from about 1930 with the appearance of Beck’s 
monomaph (2). He, in turn, had been trained in Rorschach by 
a \* . ev T* The Rorschach was not deliberately planned to test 
projection, despite its preeminence today as a projective test 
T ” * ? \ ^J^Hsylves, are nothing new or startling. Indeed, 
irur , nnn _ r *JJ CI I® P ro P ose d for the training of painters throw- 
tiomo^ rF S? ° f Va ?° US colors a S ainst the waU, for the percep- 
to To ^! SCC , an , d significantly, he added, provided one wants 
cemed 0 0 S lca research in the narrow sense is con- 

in 1893 hv Bin P ro P ose< l 35 a measure of visual imagination 

Pub^cd lT-T Hc r (38) - Dearbom (9. 10), two yjs later, 

sSyss six* “““ — i r»r ■< 

or lc 5 Vstauknr„,°l pr ° iC , CU °5 “ a method of testing arose more 
published and_ independently. In 1935 Murray (23) 
Test Scan (28) m ?hl” PjSP 01 on Thematic Apperception 

<J) published his Guide (o Afcn"ofr^c n - h ? 5™° >e f r ’ Ca ‘. C 

enrol Testing, including a description 
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of a projection test Since it is probably not too well known to 
American audiences, it will be described briefly. He called it a 
projection test, which, I believe, is the first time the term was 
applied directly to a test. It consists of 74 items in which each item 
has three alternatives as in the following instance (33, p. 71): 

John strained every nerve to beat the others because: 
he was determined to be top 
his father wished him to succeed 
he needed the scholarship. 

Tlie most appropriate of the three endings was to be checked on the 
assumption that one person will project his own chief impulses onto 
the ending chosen. If self-assertive, he would be likely to choose 
the first; if submissive, he would prefer the second. The items were 
developed so as to give scores on Self-Assertive versus Submissive, 
Cautious versus Bold, Acquisitive, Gregarious, Curious and De- 
pendent tendencies, adapted from McDougall’s list of instincts. 
Standardization was not carried out and not too much use has 
been made of the test. In today s perspective, it could be called a 
multiple-choice, sentence-completion test. It is pertinent to my 
theme to indicate that, if standardized, this could have been an 
objective test in the sense that scoring was objective. The more 
detailed and explicit formulation of the projective hypothesis of 
L. K. Frank (16), appeared in 1939. This was the first major source 
of knowledge of projection which became well known to psychol- 
ogists. 

Thurstone considered the projective procedures "the nearest 
approach to personality tests” in revealing personal idiosyncracies. 
He asked only that it be unstructured for the subject but well 
structured for the psychologists since with this structure it could 
be objectively scored. However, he went on to add, if the interpre- 
tation were as unstructured as the test, it would be useless for 
scientific inquiry. Structure, from the point of view of the examiner, 
may be equated with objectivity. Rorschach inkblots seem highly 
subjective, but experts can prepare independent interpretations 
which agree on essential particulars. This form of objectivity is 
one of the grounds on which the Rorschach is defended. 

The influence of projective techniques upon objective personality 
testing, involves diametrically opposed influences. Undoubted y, 
this kind of measurement increased subjective, impressionistic, in 
tuitive trends in psychology. The heady wine of its multi- unen 
sional character; its relation to dynamic theory, particular y pty 
choanalysis; its enthusiastic reception by psychiatric colleagues. 
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and its usefulness in clinical settings all contributed to this anti- 
objective trend. Yet, without its challenge to objectivity, psychol- 
ogists would probably not have seen the possibilities of increasing 
the scope of objectivity to include improving the objectivity of the 
psychologist himself. In considerable measure, whether viewing the 
use of projective techniques sympathetically or ns irritants, it has 
forced us to rc-evaluatc and broaden our meaning of objectivity. 

The many validity studies of the Rorschach that have produced 
negative results have given for the third time in the last fifty years 
an excuse to be skeptical of testing, this time of projective per- 
sonality testing. Since this is a current skepticism, no one can say 
“what happened next" But something will happen, and I would 
suggest that it will be further objectification of projective tech- 
niques, but without disregard of the complexity and subtlety these 
devices permit. In one of the later chapters of this book, 
Holtzman focuses attention on the problems of objective scoring of 
projective techniques, while preserving their underlying purpose. 

What Is an Objective Test? 

Now that the historical survey has been completed, I would like 
to consider present-day thinking about objective personality testing. 
In order to be able to present a cross-section of present-day con- 
ceptions of objective personality testing, the authors of this sym- 
posium indicated what they considered to be the meaning of 
“Objective Approaches to Personality Assessment,” with special 
emphasis upon the qualifying term, "'Objective.” 

Rais considered objectivity to bo complete independence $rom 
examiner effects, or as he also put it, zero variance due to the exam- 
iner. Berg referred to scorable, fairly clearly structured tests for 
which scoring would be identical if performed by competent persons. 
Edwards emphasized the rigorously defined method of scoring. Mc- 
Quitty considered it to mean the isolation of consistent individual 
differences in a manner such that numbers can be applied, resulting 
in a similar classification or measurement of behavior by different 
users of the approach. Pepinsky, in relating objectivity to an ap- 
proach to personality testing, disclaimed use of the term but be- 
lieved what is meant is two-fold: ( 1 ) minimization of errors of 
observing and recording, and (2) minimization of variability in the 
task conditions on separate occasions (not, he adds, in minimizing 
stimulus ambiguity or uncertainty for the subject). 

In varying degrees and either implicitly or explicitly, many of 
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As I see it. Super is saying that objectivity is either clarity or 
structure, suggesting that these terms are more clear in the present 
context than is objectivity, though not denying meaning to ob- 
jectivity. His personal preference, he goes on to state, is the struc- 
tured-unstructured continuum, so far as classification of tests are 
concerned. 

Hunt, as might be expected from his present interests in objectifi- 
cation of clinical impression, does not limit himself to test settings. 
He writes: 


I interpret “objective” as pertaining to “public” rather than "private” informa- 
tion Thus the data of introspection are private until turned into some form of 
report when they thus become public, since the forms of report, language or 
behavioral, can be handled as public, verifiable (by others) phenomena. “Ob- 
jective has many parameters, loosely the clarity and specificity of definition of 
the phenomena, its duphcabihty, its control for experimental observation, its 
statistical amenability, etc. 


This is a still broader definition, and very close in spirit to Feigl’s 
account of intersubjective testability. 

Hathaway gives the most detailed analysis, but I am taking the 
liberty of quoting him verbatim. 


I believe I am correct m placing our local emphasis m definition of the word 
objective upon the qualities of reproducibility and most of all upon the ab- 
“““ “I “ “tovtnmg interpretation between behavior of the subject and the 
milted , 1 a '' .1 f° 10 fhird person. Data are objective when they are trans- 
vmbati^m^L fl0m , U 'n t0 olhers wb ° ma y then interpret them. The 

somethin!. FF n S ? or 5cnach cards are objective items, but they become 
n m 3 S o?W 0OOSd un P ro P er > “Ucd subjective) when they are classified 
bv a ? U ? ClCn “ d *<> The MMPI items cheAed 

put into .cales n U ° ^ ec t lv ? information, and these remain objective when 
ob eSil ° f ^ roei ““*S of Ule **><* or profiles is no longer 

obiectivrtv as L n ^ 15 aa objective item when presented verbatim but loses 
on the part of lb 33 mterprctaticu or condensation or expansion occurs 
Soenmc^ mw There m “lonnethate wtuationL A senes of 

greater freouentn. ft, U ccrtaul Rorschach responses occur with a 

a given tesoon^nr 0t ^ ezs ^ rcscrvui S objectivity, one could then classify 
Tin; freouenev seor. 1. ° rcs P onses “ Jawing a certam degree of frequency, 
however resnnnJ^ C °r I,CS ’ *berefore, an objective item It is to be noted, 
elements' of the rove S e{ l u f ent ty oeovir m slightly unusual form, although 
SiTSu a ^ rSSoSrX r T, flCqUCnL The eould exorcist freedom 

examiner frccIS^en ^ ^ ‘T™ 0nc 01 “ mfiequent one. When such 
Similarly, an MMPI administoe i *^ 0 ? ' thc Ie ? J b n g score loses objectivity, 
not be mentioned m prescntation'lifTb ” i?** 131 d f vla ” 1 coudibons that may 
loves objectivity OnFnrA,^ i n u° t ^ c °^) cctlve data from the responses also 

roughly'sSd IS 10 “P- 1 ” £ - ^ ** 
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I have not developed the idea of reproducibility, but it is inherent in what 
I have been saying One does not require that the objective score be reproduci 
ble in exactly the usual sense of reliability but rather, that there be possible a 
hypothetical construct representing a reproducible eleirent in the subject This 
construct is not possible if the product of the situation represents some inter 
action of the examiner with the subject or some aspect of the examiners psyche 

Most projective devices (and I tend to use test as almost completely imply 
mg objectivity) have traditionally been much less objective than devices that 
permit the patient to make responses that can be treated by clerical means It 
is unfortunate that objectivity has been tied to “paper and pencil” and to the 
idea of formulated items such as m the MMPI I believe that we would benefit 
from the attempt to extend objective measurement to include not only the ob- 
jective aspects of projective devices which has been partly developed but also 
objective ways of treating interview material and free behavior as this may be 
observed by others 


Projective and Objective Tests are not Dichotomous 

Interest in searching for objectivity is by no means confined to 
the authors of this booh A considerable variety of opinion has been 
expressed elsewhere One that I consider especially pernicious, 
when the distinction is made without qualification or explanation, 
is that between objective and projective tests, treating them as if 
they were mutually exclusive One rather widespread systematic 
error has been to contrast projective tests with all other tests to 
which, unfortunately, we have sometimes applied the undeserved 
label, of objective tests For example, m some of our Annual Review 
of Psychology, two of the major sections on diagnostic testing have 
been labelled projective and objective Many “objective tests are 
not objective m any of the senses we find the world to have been 
used, and projective test materials may be treated objectively If 
we must have only projective tests and something else, which I do 
not beheve is the case, the lame category of nonprojective is a shade 
better, because, at least, it does make an invidious comparison 

There has been some involvement, spurious m my opinion, in the 
question of objective versus projective, in relation to the nomothetic 
and idiographic approaches Beck (2) has asserted that objective 
tests are limited to the subpersonality” in the course of discussing 
the question, or pseudo question if you will, of the idiographic and 
nomothetic approaches It may be, that projective tests have more 
adherents from those with an idiographic approach and objective 
tests have more from the nomothetic camp, but it does not follow 
that objective tests cannot be used as measures of personality o 
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be sure, single objective tests or test items will certainly fail in this 
task Factor and pattern analyses, as in the Cattcll and McQuitty 
approaches, are surely approaches to the total personality, although 
with the inevitable loss of individuality or uniqueness that accom- 
panies any theoretical formulation of personality, including that in 
projective formulations. 


What is Tested? 

White’s discussion (38) of what is tested by psychological tests is 
pertinent He points out the psychological tests can no longer be 
regarded as inducing specimens or samples of performance of 
restricted functions. The samples may be conceived of as inducing, 
say, problem-solving capacity, but many other characteristics of 
personality also contribute. He argues that we can never arrange a 
situation on which one variable alone is tested. For example, the 
problem-solving measure may also tap frustration tolerance, anxiety 
control, and level of aspiration. Tests consequently supply over- 
lapping information. In line with this. White goes on to propose wc 
must use test batteries since we can no longer pin our faith on 
single tests. By use of a test battery, there is an increase in knowl- 
edge, not merely in an additive fashion, hut in geometric progres- 
sion. In the same vein he proposes multiple examiners. He speaks, 
in this connection, of psychological tests not yet being so objective 
as to dispense with this safeguard. In a sense, then, the whole 
discussion is a plea for objectivity, but objectivity at a high enough 
level of complexity so as not to do violence to the complexity of 
personality. 

This point of view can be seen in contrast with the position of 
factoring psychologists who are interested in purifying their meas- 
ures so they are free of what could be called contamination. But one 
mans contamination is another man’s extra premium of subtlety. 
A not inconsiderable group of psychologists accept this position, 
considering that it is not only futile to try to purify their existing 
instruments, but also that it is quite valuable, especially in clinical 
settings, to have these additional premiums of information. 


Some Classifications 

There has been some interest expressed in the classification of 
tests and personality measures. Rosenzweig (27) has classified per- 
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sonality measures into objective or overt, subjective or covert and 
projective or implicit levels of reference. He, however, explicitly 
indicates that they are not too exclusively associated with one or 
another of the diagnostic methods. In fact, the same instrument 
may supply information at all three levels. They are not defined so 
as to give objectivity to only one level Campbell (5) has developed 
a classification of tests based on three dichotomies. In the first 
dichotomy he contrasts objective tests for which the subjects under- 
stand there are correct responses, and voluntary tests in which, in 
one fashion or another, the subjects are informed that there are no 
right or wrong answers. The other dichotomies are direct versus 
indirect, having to do with the subject’s understanding of the pur- 
pose of the test, and frec-response versus structured, having to do 
with the usual distinction made between them, but from the point 
of view of the subject. He would classify tests in terms of these 
three dimensions simultaneously, as in the voluntary, indirect, free 
response type, which would include the Rorschach and the TAT, 
and the voluntary, direct, structured type which would include the 
MMPI, and so on. His use of objective runs counter to several of 
the meanings of objective we have considered. In large measure, 
this arises from his use of objective in a phenomenological orienta- 
tion— the phenomenologically objective environment. Accuracy and 
error, as he says, are in the subject’s mind. Rosenzweig, in contrast, 
refers to the psychologist’s orientation, not the subject’s. Perhaps 
some classification uniting both, the subject’s and the examiner’s 
frames of reference will give us an even more adequate classification 
than do Campbell’s and Rosenzweig’s when considered separately. 

I am reminded in this connection of George Kelly’s witty remark 
that, “When the subject is asked to guess what the examiner is 
thinking, we call it an objective test; when the examiner tries to 
guess what the subject is thinking, we call it a projective device 
( 2 °). 

Space limitation makes impossible an exhaustive survey of the 
remainder of modem literature relevant to the question of ob- 
jectivity in personality testing. Certain other selected references 
might be mentioned. Frank (17) contrasts the psychometric and 
projective approach. In the context of norms for the TAT, Rosenz- 
weig (26) has offered a discussion of how they help in the process 
of objectification. Levinson (21) compares and contrasts projective 
and ability tests. Rapaport (24) discusses the principles underlying 
nonprojective tests of personality. Allport (1) considers the ad- 
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vantages of straightforward, direct methods over projective tech- 
A .™ on S llls , vari °us publications, Eysenck deals more 
tovwJ Ha\ v:' 13 , 1 le mcans uy objective personality tests in a 
demnnsi r ,cnson (31) lias devoted considerable effort to 
demonstrating the objectivity of his variety of Q-technique. 
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of the enthusiastic and transient support of fads In fact, Jensen 
(17, P 306) was moved to write a formula for a psychological fad, 
which consists of making available an easy-to use measuring device 
^ith a significant label and fascinating content” The reasons for 
this seem clear a readily used and fascinating tool prompts people 
to make use of it When it Ins some bearing on a current theory, 
this use becomes not only interesting but academically respectable 
and even, perhaps, prestigeful 

Perhaps one reason why theories take on something of the nature 
of the fad is that the instruments used m studying them at first 
seem relatively simple, suggesting tint one can readily test the 
underlying theory' In due course, research proves that neither the 
instrument nor the theory is as simple and straightforward as it 
seemed, and the fringe researchers drop them, moving on to a 
newer theory and a newer test And so, as Jensen says, the theory 
does not die, it fades away That the fading process is only relative, 
but not complete, is shown by the emphasis put on introversion 
extraversion in Eysencks current work (9) An historical example 
is the revival of interest m the expressive movement, resulting from 
Allport and Vernon’s work twenty five years ago (2), about ten years 
after Downey’s work with it (7) had actually faded 

I feel humble m dealing with theories for another reason, for I 
work, not as a theorist, but as an empiricist As an empiricist, how- 
ever, I feel the need to organize facts, to see what they add up to, 
and to be guided m planning my further work by the perspective 
thus gained This, I have found to my surprise, makes me, in the 
true sense of the term, a theorist For theory, so I am informed 
by theoretical theorists, is nothing more than the attempt to explain 
the relationships between sets of facts I do that, and so do all of 
us, more or less consciously and in more or less sophisticated ways 
The result is a rationale, a set of assumptions, underlying every in- 
strument we use, every technique we employ These may not be 
integrated into theoretical systems, but they constitute theones 
Perhaps here is the difference between the theorist as we generally 
conceive of him and the empiricist the former constructs a system, 
the latter does not build nor adopt a system but gets along with a 
less well organized body of related facts 

In preparing this paper, then, I found that I was dealing, not with 
theoretical systems, but with the limited theories, with the more 
specific assumptions and hypotheses underlying methods of asses 
sing personality Most test authors have not consciously developed 
their tests in terms of a systematic theory of personality, rather, they 
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have developed their instruments on the basis of more specific 
hypotheses concerning the ways in which personality manifests 
itself Underlying these hypotheses, of course, there have been 
theories as to both the structure of personality and its dimensions 
The structure issues have generally been those of the orgamsmic 
whole versus the atomistic constellation of traits, the dimension 
issues have been questions concerning the traits that make up the 
organism or the constellation 

Since our concern is with the measurement of personality, rather 
than with personality theory as such, and since I am perhaps a 
measurement practitioner but certainly not a personality theorist, I 
shall address myself to the first type of theory, that is, the lower 
order theories and hypotheses governing the development of per- 
sonality measures And, since our focus here is on objective ap- 
proaches, I shall deal at greater length with that end of the 
continuum, treating the more subjective or unstructured methods 
only enough to place all of these approaches in perspective 


The Observation Approach 

Performance One approach to personality assessment is through 
observation of performance, of the personality in action It assumes 
that people are what they do The medium in which the personality 
characteristics manifest themselves in thus overt, observable be- 
havior, the method of assessment is observation, the measure result- 
ing is a characterization or a rating Let us look briefly at each of 
these categones 

Media The medium in which the performance takes place may 
be either of two types a life situation, or a miniature, that is an 
artificial or manufactured, situation In the life situation test the 
basic assumption is that significant personality traits manifest them- 
selves m everyday behavior, and that observations of this behavior 
may be recorded to obtain a meaningful picture of personality 
functioning In the miniature situation test the assumption is that 
one can set up situations that bring out important traits more 
quickly and more completely than in everyday life, for closer ob- 
servation by more highly trained observers 

Method The method of measurement, recorded observation , 
makes assumptions concerning the observer and the situation Basic 
among these is the assumption that observers can identify significant 
avior, noting it, classify mg it, and even judging its strength 
Measures These observations may be recorded in several ways 
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In sociometrics, participants in the situation record their choices of 
Inends or companions for real or fictitious activities, thus in effect 
summing up their observations; it is assumed that people are aware 
of their differential preferences and are willing to express them 
under appropriate conditions. In nominations, participants name 
the persons whom they believe fit each of a series of labels, roles, 
or personality descriptions; here, too, they in effect summarize 
observations of performance. The assumption is that people notice 
individual differences, particularly differences in roles played in 
social groups, and are willing to share these observations. In ratings, 
either participants or observers rate participants in the situation for 
characteristics that are believed to be important and that are con- 
sidered likely to manifest themselves in such a situation. The as- 
sumption is that the rater will be able to identify the trait or 
behavior in question, and be able to make a judgment concerning 
the frequency or degree to which the subject manifests it. In be- 
havior frequency counts, the observer records the incidence of types 
of behavior or actions which are expected to have significance, the 
assumption being that the frequency of types of behavior in that 
situation is indicative of behavior tendencies in other situations 
and reflects underlying personality dimensions. 

Evaluation. It would be worthwhile, but this is neither the time 
nor the place, to evaluate in detail the construct and predictive 
validity of each of the types of measures and media used in assessing 
personality through performance. At the risk of overgeneralizing, 
and without citing evidence which has been well summarized by 
Heyns and Lippitt (14), by Bass (4), Lindsay and Borgatta (24), 
Flanagan and others (11), and by Hollander (15), I shall nevertheless 
express in a few brief statements my understanding of the status 
of each of the measures and media used, of the method (observa- 
tion), and of performance tests of personality in general. 

First, the measures. Behavior frequency counts have generally 
proved useful when appropriate dimensions of behavior have been 
identified for personality assessment as pointed out by Bass (4), the 
Ohio State group (31) and others working on the initiation of struc- 
ture. Ratings have proved useful as global measures, but not as 
measures of specific traits, for global measures seem to reflect either 
the success of or liking for the subject. Nominating and sociometric 
techniques (15, 24) have stood the test of experimentation rather 
well, the former for the assessment of social roles and the latter, like 
ratings, for the assessment of social acceptance. 

Secondly, the method. Observation puts a premium on the person 
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doing tlie observing if little structure is provided, on rationale and 
training if much structuring is done (14) In this method, the person 
is the instrument more than the devices which he uses to record 
his observations when structure is minimal, less than his devices 
when his procedural directions are so refined as to make a machine 
of him The question has typically been that of the clinician as a 
tool (28), a topic of considerable current interest with which others 
will deal later in this book Hcyns and Lippitt point out in the 
Handbook of Social Psychology (14, p 403), making the clinician 
rather than his procedures central is a mistake in the use of observa- 
tional methods 

Finally, in this brief evaluation of observation of performance in 
the assessment of personality, we come to the media , the life situa- 
tion and the miniature situation The former has the advantage of 
realism, for here the test is life itself, but it has the disadvantages 
arising from the psychologists inability to control the situation 
The sampling of behavior may be poor, and conditions may make 
the recording of observations difficult, as exemplified m studies 
made of personality factors in survival In those studies observations 
of any one air crew m the Strategic Air Command living under 
survival conditions could be made under either winter or summer 
conditions, but not under both Observers could watch all of the 
crew some of the time and some of the crew all of the time, but 
not all of the crew all of the time The miniature situation test, 
on the other hand, tends to lose verisimilitude both m content and 
in the motivation of participants for some purposes, although for 
others verisimilitude can be achieved The leaderless group dis 
cussion can, for example, capture both the content and the spirit 
ot many executive situations 


Personality Projection 

Projection 
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* atw ? might follow George Kelly (20, p. 335) in describing them as 
the fluid-blot people, the human-picture people, and the disjunctive- 
sentence people. The media differ in two ways. They differ in the 
amount of structure provided, as in the amorphousness of an inkblot 
and the focus provided by the sentence-stem, as well as in the use 
of stimuli which may evoke either impersonal or personal content. 
Inkblots, for example, are more likely to elicit impersonal responses 
than are the cartoons of the Rosenzweig PF test. 

Method. The method of test scoring and interpretation used in 
this kind of assessment may be contrasted with the recorded ob- 
servation method used in situation tests. The observer, we have 
seen, is a cross between a clinical instrument and a machine because 
he collects, sorts, and records data; in proj'ective testing, on the 
other hand, it is the medium, the test, that collects, sorts, and pro- 
vides a record of responses. Since important further sorting of data 
takes place later in both types of testing, it may be important to 
illustrate what I mean here. In situation testing, die observer makes 
decisions as to what behavior to record, sorting out responses such 
as sneezing as ‘not-to-be-recorded,” and others, such as asking a 
neighbor a question, as "to-be-recorded.” In projective testing, the 
test, not the examiner, does the first sorting for the directions ask 
the examinee what the inkblot might be and the examiner merely 
records the responses to that stimulus; this is even clearer in the 
case of incomplete sentences tests, in which the examinee reads 
each question and writes his response himself. Here the examiner 
makes no decisions as to what behavior to evoke or to record. The 
examiner plays his important part in analyzing the recorded data, 
both in the inquiry of a Rorschach examination and in the scoring 
of Rorschach, TAT, and Incomplete Sentence Test protocols. This 
may or may not be done by the observer in a situation test, but 
the scoring processes are basically similar in both situational and 
projective tests once the data have been recorded. They may be 
quite objective, as in the identification and counting of forms or 
of structure-initiating responses, or they may be very subjective, as 
in the making of a global rating of leadership promise. Scoring may 
use a Gestalt approach and focus on stimuli to which responses 
are made, as in scoring form and color responses in the Rorschach, 
or it may use a psychoanalytic approach and deal with the content 
of responses, as in determining sex identification in the TAT. . 

Measures. This leads to the question of the measures derived 
from projective techniques of assessment, and to theories that have 
been much debated in the literature. One theory is the organisrmc, 
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which is used to justify a global interpretation of the projective 
protocol Several other types of theory, ranging from neobehavior- 
lsm to psychoanalysis, are used to support the objective scoring of 
the protocol through the identification and counting of well defined 
responses These scores are in turn interpreted in terms of tlieir 
hypothesized, or occasionally their demonstrated, significance It 
would be possible to dwell on these theories at some length, but 
that is not the purpose of this paper 
Evaluation Perhaps the best way to evaluate briefly is to see 
what the evaluators say rather than by looking at each medium, 
method, and measure. We can do so by glancing at the Annual 
Reviews for three recent years In 1954 Lowell Kelly wrote 
(18, p 288) 4 The cunous state of affairs wherein the most widely 
(and confidently) used techniques are those for which there is little 
or no evidence of predictive validity is indeed a phenomenon ap- 
propriate for study by social psychologists ** 

In 1956 Cronbach wrote (6, p 173) mat “Assessment m the OSS 
style has now been proved a failure,’ and he cites, among other stud- 
ies, the Holtzman Sells analysis of the predictive value of clinical 
analyses of projective protocols (16) He concluded that, “Assessment 
encounters trouble because it involves hazardous inferences,” m 
which assessors go considerably beyond known relationships be- 
tween predictor and criterion variables He quotes Symonds and 
other students of projective methods, to the effect that there is little 
theoretical basis for expecting fantasy, as revealed by projective 
techniques to be directly related to overt manifestations of per- 
sonality such as academic success or work proficiency 
In the 1958 Annual Review (20) George Kelly is cautious, but the 
impression resulting from the specific studies that he cites but re- 
trains from synthesizmg is not good Cronbach’s earlier conclusions 
seem o o d, and even if it does in due course develop that pro 
jeetive techniques have construct vahdity, we must ask of what 
value the constructs are tf they do not help predict behavior 
A lew years ago someone suggested that if we were honest we 
usl "g projective techniques ,n assessment, since they 
e not been shown to have predictive vahdity After a review 
that n ‘H Rorschach (29) which led to the conclusion 
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chach was abandoned as a clinical tool at the Maudsley Hospital. 
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Although we have considered dropping the sloll course m pn>- 
jectives in my own institution, we still require that all clinical, 
counseling, most people m personnel, and all school psychologists' 
devote a large block of time to acquiring some competence with 
these projective techniques, the utility of which is unknown We 
have agreed that they have no validity, but we retain the require 
ment We do this for three reasons 1) the unsatisfactory but 
practical consideration that such psychologists are expected to have 
these skills and are likely to both feel and be handicapped if they 
do not, 2) the fact that they can learn something useful about clinical 
mteraction by studying these procedures, and, 3) the hope that 
familiarity with these methods may yet provide psychologists with 
a basis for some major break-through in the field of personality 
assessment 

Self Description 

We come now to our third and last type of personality assessment 
procedure, self description George Kelly (20, p 332) describes this 
type of method as the one in which “the subject is asked to guess 
what the examiner is thinking” as contrasted with projective tech- 
niques m which “the examiner tries to guess what the subject is 
thinking ” This is the oldest of our three types of approaches, and 
that wluch has most readily lent itself to experimentation and re- 
search It best qualifies as an objective approach to personality 
assessment, bemg near the upper end of the structured unstructured 
continuum By some definitions of that term I might legitimately 
have confined my paper to this type of device While it seemed 
wiser to try to gain perspective by reviewing all three types, I have 
saved this type for the last with the aim of dealing with it in some- 
what more detail, under two headings trait lists and biographical 
inventories. 

Media I Trait Lists 

The media for self-description in assessment work have typically 
been behavior or trait lists in the form of the personality inventory , 
the check list, and the rating scale These, accordmg to such div erse 
students of personality as Gordon Allport (1) and Frederick Wjatt 
(35), are appropriately used because of the importance of conscious 
motivation in normally well integrated people They are direct 
methods in that they ask the individual to describe himself as lie 
sees himself George Kelly (19) and Leary (22) have recently, with 
quite different approaches, made considerable use of self desenp- 
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tions. The various underlying theories of personality arc too numer- 
ous and too diverse for discussion here, but it is relevant and 
possible to comment on the general theoretical acceptability of these 
methods. 

Because self-description was found, in Woodworth's work in 
World War I, to work reasonably well, this medium was widely 
exploited during the pragmatic, empirical, behavioristic period that 
followed. With greater sophistication in assessment and personality 
theory, self-descriptive methods came into disfavor among psychol- 
ogists. The devices developed by Woodworth, Allport, Laird, Bcm- 
reuter, Bell, and others were still used, for lack of better methods 
of personality appraisal, but generally with recognition of their 
weaknesses and with some apology for not having something better. 
Thus in 1944 Mailer wrote, in Hunt’s symposium on personality 
(27, p. 180), “It is the psychologist’s dilemma to choose between the 
standardized questionnaire which is broad in scope but of doubtful 
validity and the performance record which is obviously valid but 
of narrow scope.” 

The dissatisfaction with the self-descriptive inventories which 
resulted from the unbridled empiricism of the 1920’s led not only 
to work with other methods but also to better empirical work with, 
and better theorizing, about self-description. Tims Guilford em- 
barked upon the twenty-year long program of refinement of per- 
sonality inventories through factor analysis, which has taken current 
form in the Guilford-Zimmerman Temperament Survey. Hathaway 
built on RosanofFs theory of personality as well as on the Minnesota 
empiricism in selecting items for, and for ascertaining the concur- 
rent validity of, the Minnesota Multiphasic Personality Inventory. 
Edwards developed the Personal Preference Schedule by combin- 
ing Murrays need theory with psychometric improvements such 
as the equating of items for social desirability, and Bills developed 
his Index of Adjustment and Values with the help of self theory both 
in evising his scoring system and in designing validation experi- 


Method. Early work with personality inventories relied on a 
combination of content and construct validities. Items were written 
or selected because they described symptoms which were believed 
° ty P es of adjustment; this was content validity 

as defined by the APA Committee on Test Standards (3). Items 

arnwm!»n! nC •fi° 1 Lu C ^ CCte ? ° n ^ asis °f internal consistency, of 
Tv! UlC tQtaI SC ? re for the scalc ^ question: this was 
construct validity as now understood. It was generally assumed that 
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if the item described a symptom that had been observed to char- 
acterize a type of maladjustment, such as neuroticism, and if an 
swers to it tended to agree with total score for that trait, then 
validity was demonstrated It is mteresting to note that these two 
types of procedure fell into some disrepute until the APA Com 
mittee gave them a name, and that their new respectability was not 
much dimmed by the greater stress put by that Committee on 
predictive validity! 

Experience and experiment showed, however, that construct 
validity is merely suggested, not proved, by item score correlations 
and item inspection and that it is not equivalent to empirical valid- 
ity Scales purporting to measure neuroticism, for example, were 
found by Landis and Katz (21) to contain some items which were 
answered by neurotics m the normal way, and by normals m the 
neurotic way, more often than m the expected manner And the 
scales were found not to have appreciable concurrent or predictive 
validity, in that they failed to differentiate effectively among groups 
of people who were known at the time of testing or later to differ 
in significant and presumably relevant respects, such as neuroti- 
cism, type of psychosis, social role on a college campus, or occupa- 
tion This led to the decline of interest in the content and construct 
method of developing self descriptive instruments which character- 
ized the 1940’s, and to less emphasis on the assumption that people 
can and will accurately describe their own traits 

One of the further outcomes has already been mentioned it was 
a greater interest in empirical validation, in both concurrent validity 
and predictive validity as now understood The best example of 
this approach is the MMPI, in which McKinley and Hathaway set 
out to devise a self descriptive inventory that would differentiate 
between various types of maladjusted and disturbed patients, and 
between these persons and normals But, it was recognized, in the 
first writing or selection of items that it was important to have some 
kind of guide, that is a theory which would generate hypotheses 
concerning differences 

A second outcome, also mentioned, was thus an emphasis on a 
higher order of theory Theory of a very low level had been tapped 
in earlier work lists of symptoms characterizing each group were 
examined for suggestions ns to items A higher order of theory was 
now brought into play, however, by the demonstrated weakness of 
the symptom method it was the use of a theory' of personality organ- 
ization In the case of the MMPI it was Rosinoffs theory tliat pro- 
vided a framework for the inclusion or exclusion of existing items. 
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and for suggestions as to additional types and traits and of behavior 
winch might be included Item score correlations are obtained, or 
items are factor analyzed, by some investigators m order to establish 
internal consistency or to vahdate theories concemmg personality 
structure But the ability of total scores to differentiate between 
criterion groups is also checked, and in addition item criterion 
correlations are often obtained in order further to purify the scales 
A third outcome was a recognition of the fact that self description 
cannot and need not always be taken at face value Empirical 
validation can be used to strengthen construct validity, as in the 
MMPI, but it can also be used to avoid construct vahdity as an 
issue, as in Strong’s Vocational Interest Blank The success of purely 
empirical self descriptive instruments, unlike that of those having 
a theoretical basis for item selection, left unanswered questions 
concerning the reason for the success of the empirical approach 
And hence the fourth and, so far, final refinement m the develoji- 
ment of self-descriptive methods Referring specifically to Strongs 
Blank but making a point more broadly applicable, Bordm sug- 
gested (5) that the reason for the validity of self descriptions lies 
tn the fact that a self -description reflects a self-concept, and that 
self concepts have a directive effect on behavior Thus the man 
who describes himself as friendly may not actually be friendly, but 
his behavior does tend to resemble that of other persons whose 
constellations of self ascribed traits are like his The man who sees 
himself as friendly, active, and alert may not actually be friendly, 
c ivc, an a ert, but he is likely to act in the same way as others 

™ ,"? h r GlveS Hls sclf conce P t similar, and the as- 
sociated behavior tends to be similar 
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old self-othcr-idcal technique (34) In these, self ratings are con- 
trasted with ideal ratings, or favorable self descriptions are 
compared to unfavorable self descriptions, m order to obtain a 
measure of discrepancy between self and ideal or of self acceptance 
Some work Ins been done with still another type of measure 
derived from self descriptions, a measure of self consistency or in- 
tegration such as McQuitty (26) sought to derive by analyzing 
the congruence of sclf-attnbutcd traits The reasoning is that the 
integrated, self consistent person tends to attribute to lnmself only 
traits or behaviors tint tend to be associated or that are compatible 
with each other, whereas the conflicted or unintegrated person will 
attribute to lnmself traits that are incompatible with each other 
Evaluation In evaluating the measures, methods, and media that 
have been used to obtain self descriptions, it seems safe to say that 
we have finall) developed two approaches that lead to valid results 
One might be called the group difference method , as used by Hatha 
way with the MMPI It consists of starting with a well defined 
group, be it chnical or occupational, of developing a theoretical 
model of that group from whatever data are already available, a 
model that serves as a guide in item selection or item writing, of 
purifying the scales to which items are assigned by internal con 
sistency or factor analysis methods, and of empirically validating 
the self descriptions against concurrent or predictive criteria The 
second method might be called the generalized model method, and 
is illustrated by Cattell’s work with the 16 PF Test The person who 
uses this method starts wath a theory as to the significant dimensions 
of personality, which may be quite empirical m its origins, selects 
or writes items according to this theory, establishes the internal 
consistency of the scales to which these items are assigned and 
validates them empirically by establishing the existence of hypoth- 
esized differences in selected nosological or other groups When 
one or more of the steps has been omitted, one may well be suspi- 
cious of the validity of a self descriptive instrument Some of the 
contemporary personality inventories and adjective checklists have 
been developed by these methods, with results which appear much 
more promising than those of the less systematic and less thorough 
going approaches used prior to World War II 

Media 11 Biographical Inventories 

The second type of self descriptive technique, as distinguished 
from the trait list, is the biographical inventory The basic assump 
tion here is that one’s past behavior is a good predictor of ms future 
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behavior In his discussion of personality in terms of associative 
learning Guthrie (13, p 66) says, a person’s “past affiliations offer 
better and more specific predictions of his future than any of the 
traits that we usually think of as personahty traits But it is just this 
predictive value that is required of a personahty trait and nothing 
else” 

Method The method with which we are most familiar m con- 
nection with the biographical inventory is that of the Aptitude 
Index used by many life insurance companies since the birth of 
applied psychology, and applied with signal success to the selection 
of aircraft pilots by Kelly in the Navy and Shaffer in the Air Force 
during World War II In the last instance (12) it consisted of using 
available knowledge or hunches as to the backgrounds of success- 
ful and unsuccessful fliers to write multiple-choice biographical 
items, and of validating responses to these against a success-fail 
criterion Tims men who had relatives who held private pilots’ 
licenses proved more likely to succeed than did those without this 
kind of prior contact with flying This finding led, m turn, to the 
hypothesis that pnor favorable contact with flying makes for success, 
and other prior-contact experiences were canvassed in order to 
supply more experience items for the inventory These items were 
In i.V rn va , , an( l retained only if they predicted success 
This method suggested modifications, developed simultaneously 
by Siegel and m>sclf In my case it was m the Career Pattern Study 
(32) and in stud) mg success and failure m survival training in the 
mnr?' lnd , ( ? 3) In S,e S cls case it was in a doctoral 
Imcn m I t!’ S ‘ C , gcl ha V followed through by publishing his 
annhcat?ons’w^l C ? n< u 1Ctl u g [ u . rtllcr studies of the method and its 
tfc General TWt "p sc ^°°^ boys, with engineering applicants at 

h)™tl“s«ronccm'? lfiCa ,r° n C , 0nS ‘ Sts of P us,un S be >’ ond low-order 
experiences to " g 1° s®!® tranship between past and future 

that organize data on a 
as a basis fm infemn^ 0 ? US ‘ n ? constellatmns of experiences 
of organizing nrcstimJ!!?”?” i ‘ *\ traits Siegel’s approach consisted 
.t.tuS«^ P „^mL,? drCd ei P cn ™“s in, ^clusters to con- 
establishing tlie msMRalul i c o p ‘;’ lslc ' n cy and correlational methods 
assigned ^th™, 1 ' y ° f 'i Scales and of ‘be trait names 

predictive vahdiU of tl,i c ^™l', ,dcd ’ m addltlor, > some data on the 


M> approach 'was similar 


my Strategic Am Command and 
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Career Pattern Study research, but in applying the method to tele- 
phone operators Martin Hcjde and I have analyzed operator data 
obtained from application blanks and interviews in relation to 
turnover, m order to develop a hypothetical model of the stable 
and of the unstable telephone operator From this model, described 
in terms of biographical and trait data, we derived a list of per 
sonahty traits believed to be significant in turnover in that occupa- 
tion Actually and presumably related biographical and life ex 
perience items were then written in multiple-choice form, from 15 
to 30 items surviving the vanous editorial processes for each of the 
hypothesized personality traits 

We sought to measure the trait “independence-dependence’ by 
answers to questions concerning the age at which the subject first 
started using make up, choosing her own clothes, dating taking 
overnight trips without her parents, etc This inventory is now 
being given to operator applicants in major cities m several regions 
of tlie country, and turnover data are being collected on those 
lured Three types of analyses of the results are planned One is a 
factor analysis of the items to test our hypotheses concerning the 
trait or factor structure of our biographical data and experience 
variables The second is a study of the relationship between these 
traits or factors and turnover The third is a cross validation study 
of the items against turnover criteria to develop an empirical scor- 
ing key Validated facts and empirical scoring keys will be com 
pared 

Measures Two types of measures are thus derivable from bio 
graphical inventories of the type with which some of us have been 
experimenting One is the conventional empirical scale, of most 
interest to the classifier of students and job appheants and although 
biographical inventories have not apparently been tried on them, 
of mental patients The other is the trait or factor scale of which 
Siegel developed ten for his general high school inventory and of 
which I have developed several for my custom built inventories It 
is the trait scales empirically validated against external entena that 
are of most interest to us here, a potentially meaningful means of 
assessing personality 

Evaluation How valid these biographical data measures are as 
indices of personality traits and how sound the procedure is for 
inferring personality traits from constellations of experience data 
is not fully demonstrated I have devoted this much attention o 
this new approach to biographical data because the resu ts so ar 
indicate that it may prove a valid and more objective me o a 
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assessing personality than the self-report methods that call for trait 
descriptions 

In closing this chapter, which has of necessity covered much 
ground briefly, I should like to relate the organization which I have 
used here to that used by Leary and Coffey (23) in their work on 
personahu They distinguish among public, private, and symbolic 
levels of personality measurement The public level is that at which 
the individual appears to others, performance in my framework, the 
private level [conscious level sn Learys latest version (22)] is that 
at winch he appears to himself, self description in my scheme, and 
the symbolic (private level in Leary’s book) is that at which he 
reveals himself m projective materials, projection m my discussion 
Lear) s position is that each of these levels has its values and uses 
in personality assessment 1 have already quoted Allport (1) to the 
same effect While recognizing that at present we have data to 
justify the practical use of methods and information from only the 
public and private levels, from only the performance and self de- 
scriptive approaches, it seems to me that we must agree with Lear) 
anu with Allport that all three levels or approaches should be used 
in the scientific study of personality if we are eventually to attain 
the understanding and the control that we desire 
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3? EitsoNALmf assessment lias in- 
spiration m many distinct fields, including such specialty areas as 
clinical, educational, counseling, and experimental psychology No 
matter where objective testing arises, it is a pointless procedure 
to make measurements and scales that are unrelated to meaningful 
personality structures Consequently personality assessment and 
basic theoretical research on personality become one and the same 
enterprise 


Sttiatecy or Personalits Research 

^ J’/’M ^‘S rC5st °n on the strategy of personality research is 
1C I C *° i U5t ‘fy my later inferences Since we nosvadays 
rrTnimt C ,Z r T":, CT Pl' c, ‘ ncs , 5 of methodology and models, may I 
from lllc lluncl ‘“ on personality structure come 

rmcbnln-D 1 ' 5 , ,°[ ozonation, the basic scientific methods in 

created are reJn ' C t "' C 'bPOthcses are tested, revised, and rc- 
tro'lcd rare-, \° tu .° fnnoi Only, namely, the univanatc, con- 
mtiltisarbu- i' ° S >orIO '' c< l from classical physics, and the 
tlp’e .Iw.XrJ; 05 Illustrated in factor analysis, the mul- 
app-oac'7we^ ,ei nC , S 0n - <nno,,!cal correlations, etc In the first 
Pi • • try to hold constant escry thing but that which we are 
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interested in manipulating and we then observe how one measure— 
the dependent variable— changes with the changes we produce in 
the independent variables In the multivariate approach, on the 
other hand, we enter the experiment with a great number of vari- 
ables, usually allowing them to vary as they vary in nature, without 
attempting to control them artificially m any way We then tease 
out the relationships among them by the superior statistical potency 
of the methods which have been developed, principally m the life 
sciences, smee the days of classical physics 


Comparison of Methods 

There are advantages and disadvantages to both methods, though 
you would sometimes think from the pious expressions of brass 
instrument psychologists that all scientific purity lies with the 
classical umvanate method Actually, the multivariate method can 
claim three great scientific advantages First, it can deal with pat- 
terns and wholistic concepts A clinician in my presence once 
remarked to a psychologist that he proposed to do some experiments 
on the relation of the superego to school achievement Whereupon 
the classical experimentalist, a man as direct as he was eminent, 
snorted, “What is a superego? I have never seen one 

The implications of this remark should really be a plague to both 
their houses * So long as the older type of experimenter deals only 
with single variables he must remam blind to anything that requires 
demonstration as a complex pattern But equally e c mcian 
although taking many variables mto account, is unable objectively 
and scientifically to convince others, e g , to show a le supe 
ego is not a myth but a visible pattern, unless he comman s e 
powers of mathematical analysis to determine an emons 
loading pattern The multivariate statistical methods possess this 
power, and, as I shall hope to pomt out m my summary, i is P 
to define the superego, various drives, and a number of complex 
temperament patterns to a useful degree of exac nincm-ed 

analytic means Furthermore, when these cons c ^ 

as factors, they can enter into exact experimen y 

single concrete vanable . . j ... cheer 

The second advantage of the multivariate method* its sheer 

busmess efficiency If you go to the labor o m ^'emulation 

hundred variables, in a hundred pairs of two, on a ISJ? J? 
you get by classical experiment evidence on nmenta Fwork 

on the other hand, you do the same amount of experimental 



44 


OBJECTIVE APPROACHES TO PERSONALITY ASSESSMENT 


and use a multivariate method of analysis, you throw light on the 
nature of approximately 2,000 relationships That is to say, you 
possess the correlations in a matrix of 200 variables Actually, this 
is something more than an enormous— twenty to one— gain in effi- 
ciency 

When the 100 relationships of the umvanate experimental 
design are taken from many different samples, as commonly happens 
when an unfortunate reviewer of an area in the Psychological Bulle- 
tin is trying to make sense out of a hundred independent researches, 
the findings are essentially incomparable, statistically, because of 
coming from different samples, and they are of questionable com- 
parability experimentally, because they are always attained by the 
ldiosyncracies of the various investigators and their locations The 
hypothesis testing power, and especially the hypothesis-creating 
power of the multivariate experiment is here far greater, because 
we know that all the relations are comparable, having been made 
on the same group 

Additionally the factor analytic method has special revolving 
powers, in terms of discerning meaningful patterns among the cor- 
relations We get what a philosopher might call “emergents” from 
the accumulated, cnss-crossing relationships, such as can never 
come from reasomng about single relationships or from the blind 
game of partiallmg this influence out from that, one at a time In 
personality research it means that we are enabled to detect the 
major structures operating across this whole field, whereas when 
one works with variables two or three at a time, trying to partial 
out this from that, one is apt to run around in the prison circle of 
one s own feeble vision of possibilities 
The third advantage of the multivariate method does not belong 
;? 1 r S .T r me m psychology, but is specific to its application in 
the field of personality and cluneal study It resides m the fact 
that human beings decline to let you do controlled manipulative 
of \"ital emotional importance to them If 
L 0 y | effect upon marital couples of a mother m- 

„„ 5 S *° sta I ,n the >ou would not be well advised to 

wards tn se#. wt ** '.r,' ltatlons to m °thers-in law, dropping in after* 
vcry ,ndcpcndent «“»«» *° > our 

in th c^eld^f <, r^»°^^ eC i 1 ^ I1S mani P u kitive experimental design 

^d thc sinL^a ^ Crst 15 > ou cught not to do it, 

artificial hmilt of ’ ** * OU t * irov ’ ct hics aside and proceed, the 
the experiment may create a situation quite 
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different from the naturally occurring one When you chop pieces 
off a man s adrenal glands you do something more than reduce his 
adrenal functioning The multivariate experimenter, like the clini- 
cian, allows life itself to make the experiments, in naturally func- 
tioning organic wholes, and then extracts the causal connections 
by superior statistical analytical procedures If you stick to the 
controlled experiment in regard to emotional learning, etc , you are 
compelled like Mowrer, Miller and others to move increasingly 
away from human beings to animals This leaves you with the 
impossible task of generalizing across from animal behavior to 
something vaguely analogous m human behavior In fact, meth- 
odologically you have allowed the rat to lead you mto a worse 
cul de-sac than m any maze you ever constructed for him 
If now you look at the so called clinical method with this broad 
dichotomy of method in mind you will notice that it has several 
close parallels to the multivariate method Both deal with major 
emotional events in the lives of human beings, allowing life itself to 
provide the source of manipulation, and both work upon whobstic 
perceptions of patterns and relations, rather than upon single vari- 
ables Indeed, I think it can be seen that there is really no such 
thing as a separate clinical method (unless we are talking about a 
therapeutic method), for, when stripped down to its essential, formal 
procedures, the clinical method is the multivariate method Un- 
fortunately, though it is formally the multivariate method, it lacks 
scientific rigor, proceeding by intuition and fallible human memory, 
instead of being carried out on exact measurements by an electronic 
computer, using a far superior memory and a fully explicit statistical 
procedure In terms of progress in the scientific study of per- 
sonality, the clinician has his heart m the right place, but perhaps 
we may say that he remains a little fuzzy in the head The salvation 
of the clinical method lies in filling out its cloudy procedures by 
structural statistics, decidedly more complex, incidentally, than 
those known to univanate methodology Factor analysis is only one 
such statistical model, though it is the best we have achieved so tar 


Measurement Follows Stouctutie 

But let us now return from this surrey of foundations to my first 
assertion that measurement must follow structure l am aware 
this reiteration of “no testing without structure . makes me os ; p p- 
ular among certain kinds of test constructors In cduca 
clinical psychology as a Baptist minister reminding pcop 
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Ten Commandments in an establishment for organized vice But I 
would repeat that you may use the most impressive scaling pro- 
cedures, refining Guttman, Coombs, and others to the n ** power, 
and still be merely engaged in a sort of psychometric chess game, 
as far as any psy chological understanding of psychological problems 
is concerned If your scale is not guaranteed to deal with some- 
thing psychologically meaningful and orgamc, it cannot help m 
psychological procedures And, incidentally, it does not seem to he 
sufficiently reahzed that a Guttman scale, or any other scaling 
method per se, does not guarantee a factor pure scale A correctly 
scaled scale may still be of any degree of factonal confusion. 

When I mention a demonstrable functional unity m what follows, 
I refer technically not only to a pattern of covarying parts which 
can be demonstrated as a unique, replicable, determinate factor 
in terms of factor analysis, but also to a pattern which additionally 
could be shown to function as a whole by urn variate, controlled ex- 
periment That is to say, the pattern should show itself not only 
by a person who is higher m one element of it consistently being 
higher in the other elements, but also by the parts varying together 
from occasion to occasion when an experimental influence which 
changes this trait is brought to bear 

Within multivariate methods, this means that the factor pattern 
out of which the construct or concept arises, must be demonstrated 
not only by the classical R-techmque, but also by the longitudinal 
P-techmque It may also be demonstrated by other factor analytic 
experimental designs, such as the condition response design, in which 
one simultaneously factors in a single matrix both the various stimuli 
that might cause the pattern to change in level, and all the mamfes 
tations by which the pattern is recognized In short, to ensure that 
a unitary trait is sound in wind and limb, it should be thumped in 
many different parts Thus, m Scheier’s work on the nature of 
anxietv which has come out with certain clean cut results which 
s nil discuss in a moment, it was first demonstrated that some ten 
ps\c 10 ogical and physiological variables repeatedly emerged as 
salients in a single factor m studies dealing with individual differ- 
ences in anxiety level, lc , b> R technique 
Alter this R technique demonstration of the boundaries of anxiety 
as an indwidual difference trait, a longitudinal study was made in 
n 1C ucttnt,ons ( ,n these salient variables disco\ered on the 
1 technique factor) were measured from day to day under the vari- 
ous naturallv occurring anxiety stimuli of daily life A longitudinal 
or am ysis, by p technique, then turned out much the same fac- 
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tor pattern as had been obtained by R-techmque There were some 
differences of emphasis, but it \\ as clearly the same anxiety factor, 
marked by the same major \anables 

A third phase of the research consisted m measunng a large num- 
ber of people on this array of variables and then submitting them, 
in an analysis combimng the factorial design of analysis of variance 
with factor anal) sis, to a number of what are commonly considered 
anxiety provoking stimuli, such as important examinations, a dis- 
cussion of imaginary diseases, some probing of their economic con- 
dition, etc Correlating in the stimulus differences with all the re- 
sponse differences, resulted in the reappearance of the same anxiety 
factor pattern In this condition-response design, however, it was 
additionally loaded with the stimuli which are effective in produc- 
ing the anxiety response pattern Scheier’s work on the measure- 
ment of anxiety thus illustrates the full present scope of multivariate 
method usage and shows how a practical measurement of high va- 
lidity and determinateness can result 
This digression on complex issues of method may have been so 
brief as to evoke the comment that for those who knew them al 
ready, it was unnecessary and for those who did not, it was too short 
to carry the full implications But we must move on with the state- 
ment that if this agreement is fully examined, it provides justification 
for believing factor analytic findings rather than clinical impressions 
It also prevents our aligning ourselves, on the other hand, with that 
compulsively accurate psychometrics of scales which still narrow y 
persists in the old faculty psychology of supposing that where there 
is a single name there must be a single function 


A Brief Review of Factor Analytic Findings 

Although factor anal) tic findings over the last fifteen years have 
been evaluated elsewhere (4), it will be helpful here to give a rie 
sketch of the substantive findings wluch are the necessary basis 
for the measurement theory I have to discuss These resu in o 
are largely the outcome of certam aims and canons o researo 
method worked out in a first attempt to integrate e e , y> 
my Description and Measurement of Personality ( ) v > 
ago In the first place, our laboratoiy has always aimed to .gather 
data widely and simultaneously over the three chiep 
of personality observation— L data or life records o e > 

Q or questionnaire data, and T or objective tes a a .nation 
record medium, personality is observed in the na 
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by time sampling, rating, or beeping of records on particular events, 
e.g., achievements, accidents, etc. This is, of course, the criterion 
medium, in the sense of being an external or cultural criterion for 
any testing. In the questionnaire medium the person responds by 
giving his impressions of himself, limited by his own self knowledge 
and willingness to disclose. In the objective 1 test medium, there 
is no question of introspective self-evaluation as in the question- 
naire, but only of actual performance or response in a miniature 
situation, in which the subject does not know what aspects of his 
performance are being scored and interpreted. 

At each age level at which investigations have been made we 
have always begun by operating in the first of these media, because 
it connects most readily with existing concepts in the field, permit- 
ting interpretations of the factors in popular, clinical, and general 
terms. An immediate integration of concepts is then possible be- 
cause the measurements use the same words and situations of be- 
havior as arc covered by the clinicians, guidance psychologists, 
educators, and others. It also has the advantage that it permits use 
of a personality sphere concept, that is to say, the notion of a strati- 
fied sample of variables from the total realm of behavior. Inciden- 
tally, for those interested in perfecting the factor analytic approach, 
this ability to introduce a concept of a population of variables be- 
comes very important. 


Canons of Procedure 


In adopting this simultaneous, three-fold observation of person- 
a ity, it was our conviction that any important dimension of person- 
nel ,° f “ ob J ectKe ” h required, since there are commonly two degrees 

th e < f? stnjct j on - U) objectivity of scoring, plus, ( 2 ) objectivity in 

"°*, , " coIrin 6 * cl l appraisal It is In the latter; complete, sense that 
onlv oh irrtw- „ > u*’ an< l, we 21 0ul(1 su Rg«t the term conspcctwc for a test that is 
S hiTS ; “ *°, ‘. u * corin S- that It has a high conspcct r*I.ab.!,ty; 

the score wnt’ilrl iT ^ c a S re< ^ , Pnt between two different psychometrieists as to wliat 
i*Oan.™ A T '"M ch i5 mightbc called a raUve teat, in- 

S d ^f n<ls of a Single person and is made by 

cSlbntl™ it ‘7 1 *^ nS mC ,' h0ds will ™u‘e that Edwards In Ins 

ietthotntMderm 1 I 11 ” “ b I' cUvc ln 3 way which allows both the true ob- 
llowcstr rSclSSL'it” “T*. lhc cora P^1l'o test to come under the same beading, 
erne between the ctminecU.e *t 'r™ lml °7 of test desclopment, the differ- 

between the ratJse l V ant * 1 ,c truc °4 cctlvc test is greater tlian tiiat 
character, whereat ih c Utter C0l j*P etttv e t«t. for it deals with the whole test 
reasont, *r»l to .1 ” * witb * bc m °de of scoring Tor these adequa.e 

pub! cat i'm ironi^our L»W.r *y ltL ' mat,c;, hy adopted throughout 15 jears of 

of test. ^'*ory. w c have used objective for the non-self appralsd 



49 


measurement theory in multivariate experiment 

ality should break through all three, showing itself at once as a 

actor pattern in behavior rating data, in the questionnaire-response 
patterns, and in objective personality tests. This theory has been 
only partially vindicated, and some tantalizing exceptions persist. 
Throughout this discovery of structure as a basis for measurement 
it has been a canon of research procedure that factors shall be de- 
termined by simple structure principles, and other principles permit- 
tmg a unique, objective factor solution quite independent of any 
psychological pre-conceptions which the observer may have about 
personality structure. Psychologists who have been using factor 
analysis by rotating for “psychological meaning” are merely having 
a pleasant game perpetuating their own superstitions or prejudices. 

Some years ago X talked at Tubingen with the German psychol- 
ogist, Kretschmer, who has done such striking clinical experimental 
work in bringing out the full nature of the schizothyme tempera- 
ment pattern. Whenever I showed him an experimental pattern 
in factor analysis that agreed with his clinical impressions, he would 
say “factor analysis is a remarkably important scientific tool.” But 
when I showed a pattern that corresponded to no known clinical 
pattern, his inclination was at once to assume it to be an artifact, 
and immediately to lose interest in it. This I cite only as a rather 
amusing and well developed instance of the attitude that, together 
with some defects of statistical education, has kept the clinical psy- 
chologist from understanding the importance of these factored 
measurement developments for his work. 

Indeed, there is no need especially to pillory clinical psychologists, 
for psychologists in general seem rather prone, relative to physical 
scientists, to dependence on subjective conviction. Yet if we are 
dealing with a science rather than a religion, we should welcome 
objective methods which surprise us by turning up something that 
does not in the least fit what we knew before. Factor analysis has, 
in fact, produced surprises in the clinical field, for those who can 
see them, much as the microscope did in biology. Notably it has 
turned up at least a dozen clear cut patterns in the personality 
field, that contribute as much to the variance of behavior as any 
such familiar concepts as schizothymia, ego strength, dominance, 
etc., which have nevertheless never been visible to the naked eye 
of the clinician or named or discussed. These structures have not 
yet been accepted as the challenge to existing clinical theory and 
formulations that they really are, for they have power to yield pre- 
dictions of criterion behavior impossible from the familiar concepts. 

A third canon of our research has been that the factor patterns 
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shall be replicated , m at least two independent researches, before 
we begin to give them serious theoretical consideration To put 
this canon into effect requires considerable plannmg m research, 
to ensure that a sufficiency of identical variables to permit matching 
are earned over from one study to another For in this day and age 
we can no longer go along with the idea that the identity of a 
factor m one study with that m another study can be established 
merely by the psychologists impression of the psychological sun- 
ilanty of the two There must be accurate carrying over of salients, 
and the use of a quantitative index, such as the sahent variable 
sumlanty index, to ensure that the patterns really are alike 
A fourth principle has been that we should not be too hasty in 
interpreting the factors, but should be content to designate them 
by an mdex number in some agreed universal index among psy- 
chologists, such as that which I have proposed as an international 
mdex in the current issue of the Japanese International Journal of 
Psychology A factor will commonly become a recognized part of 
the scenery, and a basis for measurement m a good unifactor scale, 
some years before its nature is fully understood In the case of 
about half a dozen of the discovered factors, namely, ego strength, 
intelligence, anxiety, general neuroticism, schizothymia, and super 
ego strength, I think the pattern is sufficiently identical with any- 
thing that has ever been called by that name by a responsible 
psychologist, to justify using these customary names and interpre- 
tations— such as they are In about another half dozen factors a 
pretty definite idea can be formed of the physiological, experiential, 
or dynamic influence responsible for the pattern For example, 
surgcncy desurgency level is essentially the level of general inhibi- 
tion, and seems correlated with frequency of past punishment The 
factor we have called Q3 seems to represent the degree of dynamic 
investment m the self sentiment, and so on While these explana- 
tions are not as perfect and perhaps not as lurid as those that psy- 
choanalysts are fed with their Freudian mothers milk, they have 
the advantage of dealing with demonstrable behavior patterns, and 
ot permitting measurements of individual differences, with known 
validity and reliability, which can be made the basis of experimental 
investigation of theory Surely it is high time that theory began 
o mid itself around these measurable behavior patterns, frequently 
replicated in eight to a dozen researches, instead of the vaguely 
perceived and statistically unsubstantiated behavior patterns and 
sequences which many clinicians take as the basis for elaborate 
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A last canon of design, in these experiments to put personality 
measurement on a functional basis, and functional concepts on a 
measurement basis, is that continuity should be established in the 
patterns over the whole developmental age range. That is to say, 
not only should the functional unity be established at one age level, 
by the above two handed use of R- and P-tcchniqucs, but the age 
range should be cut by such studies at three or four year intervals, 
to establish the mode of growth, as one might take slices across 
the stem of a plant. This is a big order, and it has not yet been filled, 
but sections have recently been taken at 12, 8, and 4 years of age 
and are in press. 

Longitudinal Analyses 


The hypotheses of measurement here are that some patterns 
might be expected to persist over all age sections more persistently 
than others. For example, an ability like general intelligence, or a 
temperament trait associated firmly with some physiological or 
body-build component, would be expected to show itself, perhaps 
with some modifications, from the earliest testing period. On ie 
other hand, an environmental mold pattern, such as the superego 
or a sentiment to a specific object, might be expected to appear 
only at a given age and to show more pronounced developmen 
change in the loading pattern. The work on personality factors was 
initially done, for good reasons, at the young adult level, but the 
researches of Coan, Peterson, Gruen, and others, mam aimng 
combinations of life record data, questionnaire data, an o je 
test data, show that all but three or four of the factors es abhshed 
in the adult level can be traced down through childhood and even 
into infancy. For example, in the factor analyses o ^ P , 
of behavior, made at the four year old level, we can c e X f ac _ 
cyclothyme-schizothyme factor, the dominance-su ™ , 

tor, the surgency-desurgency factor, the paranoi ’. 

strength factor, and so on, operating in 

On the practical side, for the benefit of those who xv^hjo do 
longitudinal research in personality ov ®5 ^ tors in the ques- 

are in process of constructing measures of th 
tionnaire medium. Thus, 14 of the 16 factors up 

sonality Factor Questionnaire (3) can be ^ tes f called the High 
m the range from 12 to 16 years ofage^test ^ ^ 

School Personality Questionnaire (5;. t , being put 

Still be clearly recced at the nine 
into the Child Personality Questionnaire. 
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shall be replicated , in at least two independent researches, before 
we begin to give them serious theoretical consideration. To put 
this canon into effect requires considerable planning in research, 
to ensure that a sufficiency of identical variables to permit matching 
are carried over from one study to another. For in this day and age 
we can no longer go along with the idea that the identity of a 
factor in one study with that in another study can be established 
merely by the psychologists impression of the psychological sim- 
ilarity of the two. There must be accurate carrying over of salients, 
and the use of a quantitative index, such as the salient variable 
similarity index, to ensure that the patterns really are alike. 

A fourth principle has been that we should not be too hasty in 
interpreting the factors, but should be content to designate them 
by an index number in some agreed universal index among psy- 
chologists, such as that which I have proposed as an international 
index in the current issue of the Japanese International Journal of 
Psychology. A factor will commonly become a recognized part of 
the scenery, and a basis for measurement in a good unifactor scale, 
some years before its nature is fully understood. In the case of 
about half a dozen of the discovered factors, namely, ego strength, 
intelligence, anxiety, general neuroticism, scliizothymia, and super 
ego strength, I think the pattern is sufficiently identical with any- 
thing that has ever been called by that name by a responsible 
psychologist, to justify using these customary names and interpre- 
tations— such as they are. In about another half dozen factors a 
pretty definite idea can be formed of the physiological, experiential, 
or dynamic influence responsible for the pattern. For example, 
surgency-desurgency level is essentially the level of general inhibi- 
tion, and seems correlated with frequency of past punishment. The 
factor we have called Q3 seems to represent the degree of dynamic 
investment in the self sentiment, and so on. While these explana- 
tions arc not as perfect and perhaps not as lurid as those that pty" 
choanalyst s are fed with their Freudian mother’s milk, they have 
tne advantage of dealing with demonstrable behavior patterns, and 
oi permitting measurements of individual differences, with known 
\a j lty and reliability, which can be made the basis of experimental 
investigation of theory. Surely it is high time that theory began 
o JUild itself around these measurable behavior patterns, frequently 
replicated in eight to a dozen researches, instead of the vaguely 
pcrcencd and statistically unsubstantiated behavior patterns and 
sequences which many clinicians take as the basis for elaborate 
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A List cnnon of design, in these experiments to put personality 
measurement on a function'd basis, and functional concepts on a 
measurement basis, is tint continuity should be established in the 
patterns over the whole developmental age range That is to say, 
not only should the functional unit) be established at one age level, 
b) the above two handed use of R- and P-tcchniqucs, but the age 
range should be cut b) such studies at three or four )ear intervals, 
to establish the mode of growth, as one might take slices across 
the stem of a plant This is a big order, and it has not )et been filled, 
but sections have reccntl) been taken at 12, 8, and 4 years of age 
and are m press 

Longitudinal Analyses 


The hypotheses of measurement here arc that some patterns 
might be expected to persist over all age sections more persistently 
than others Tor example, an ability like general intelligence, or a 
temperament trait associated firmly vyith some physiological or 
body-budd component, would be ejected to show itself, perhaps 
with some modifications, from the earliest testing period On the 
other hand, an environmental mold pattern such as the superego 
or a sentiment to a specific object, might be expected to appear 
only at a given age and to show more pronounced developmental 
change m the loading pattern The work on personality factors was 
initially done, for good reasons, at die young adult level, but he 
researches of Coan, Peterson, Gruen, and others, maintaining the 
combinations of hfe record data, questionnane data, and objective 
test data, show that all but three or four of the factors established 
in the adult level can be traced down through chddhood and even 
into mfancy For example, in the factor analyses of tune samplings 
of behavior^ made at the four year old level, we can clearly see tho 
cyclothyme schizothyme factor, the dominance subm.ssivencss fac- 
tor ^Tmgency-diSrgency factor, the paranoid factor, the ego 
T a t so on operating m the nursery school world 

strength factor ^the benefit of those who wish to do 

In ° n f ^ ,£ r research in’ personality over a sufficient interval, wc 
ngi u a constructing measures of these factors in the ques- 

are m P™“^ “^“£ g 0 f the 16 factors m the adult 16 Per- 

tioimame meduim Thus, ^ be demonstrated d 

sonahty F**orQu<xtum™ w of age , m a test called the Iligi 

School Personality Questionnaire (5) Twelve of the factors can 
, vL Iv re ^ognized at the nme year level and arc being put 
mtVthc Cfofcf PerSnalUy Questionnaire Peterson has made sets 
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of questions, which must, of course, be given orally, which get at 
these personality factors at the nursery school level. There are many 
technical difficulties in getting a good series of personality factor 
questionnaires to operate meaningfully from the four year level 
right up through the adult level, but these difficulties must be over- 
come, because such longitudinal studies are essential both for un- 
derstanding personality and for the success of applied psychology. 

Indeed, there is both a great need and great opportunity at the 
moment for longitudinal studies in personality structure. Such stud- 
ies will, first, establish more definitely the identity of the factors 
found at the earlier level with those at the later level, by repetitive 
measures on the same children at intervals; second, show which 
factors are most subject to environmental influences, and if so, to 
what environmental influences; third, show the general curve of 
change in these personality factors in the same sense as we have 
established the normal trend in the intelligence factor; and last, 
suggest in what way the pattern of behavior typically changes with 
age. 

In regard to these changes in the test weights in the pattern to 
be measured, we may instance that the ego strength factor in four 
year old children loads freedom from temper tantrums, freedom 
from enuresis, infrequency of headaches and psychosomatic disor- 
ders, infrequency of manifestations of jealousy, etc. By eleven years 
of age, enuresis has dropped out of the loading pattern, the main 
emotional stability versus instability variables remain, and some 
new elements have come in. Similarly, in the dominance factor, 
disobedience, sulking and “talking back” are prominent in the time 
sampling variables at the early age, whereas by the adult level, 
these are no longer present. The disobedience has become uncon- 
ventionality, but the talkativeness has disappeared and the dominant 
adult is, if anything, rather more silent than the average. 


Validity of Personality Measurements 

In discussions on the validity of personality measurements, it B 
constantly to distinguish between concept or construct 
vaudity^on the one hand, and external, or cultural, validity on the 
e ormer is defined by the correlation between a given 
frnm 0T1 est <l ueshoimair e or rating scale and the factor as derived 
own criterion variables. Thus we might validate a test 
“ dl factor constructs as anxiety, or against ego strength, or 
against surgency, or against schizothymia. The external or cultural 
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validity is never validity singular but validity plural. That is to say, 
there arc thousands of things against which a factors predictive 
power could be tried and the correlations known in the interests of 
interpretation; but no one of them is the criterion. For instance, 
the use of the 1G PF test has yielded a great many significant per- 
sonality factor correlations, for example, with success in school, 
prognostic rating in a clinic, automobile accident pronencss, alco- 
holism, etc., and these lmvc greatly enriched the original interpreta- 
tions based on factor content alone. 

One of the first inquiries to be made about the nature of a factor 
—indeed it should be the routine inquiry before making any more 
specific hypotheses— is to test whether it is largely hereditarily de- 
termined or substantially a product of learning and environment. 
Obviously, this is of basic importance both for theory and for the 
proper practical use of the measurement. Indeed, one of the chief 
claims of the factorally unitary measurement is that it permits some- 
thing more than merely statistical prediction— namely, an estimate 
of criterion performance that takes into account whatever genera 
psychological knowledge about the natural history of a trait permits 
us additionally to infer. Fortunately, some fairly extensive nature- 
nuturc studies have already placed the principal factors in perspec 
tive, in relation to such older factors as Spearman s g. For exam 
pie, we know from multiple variance analysis studies that the eye o- 
thyme-schizothyme factor is largely hereditarily determined, that a 
surgency-desurgcncy source trait is largely environments y ecr 
mined, that the level of dominance-submission is about 5 a Pf 
uct of constitution and familial-environmental influences, an 


on. 

Although the greater meaningfulness of personality 
based on factors arises from the possibility of building aroun 1 
°f these functional unities a rich natural history, the ac ua S* 
of such knowledge has barely begun, because of the ex em . , 

,°f satisfactory proof of the factors themselves. ’ ease 

Jests that can be and have been spawned with muc gr 
have accumulated gargantuan standardizations, as we 
mentum of enormous numbers of past students w ose g 
to be exclusively in the rituals of administering t em. 
first small mammals entering a world possessed y ^ 

the lately arriving factored tests, have a validation Iarg y 
future. 
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Dynamic Calculation 

The most recent, and as yet scarcely noticed, development of 
factored measures lies in that area of dynamic calculation which 
is so vital to clinical psychology and to motivation theory Tins 
rests on the discovery that the drive patterns in man can he estab- 
lished by the factoring of collections of objective motivational 
measures Sex, self assertion, fear, and six other drive patterns have 
been replicated now m three successive studies by these means 
Alongside these easily recognizable drive patterns, there occur pat- 
terns that closer scrutiny suggests can only be acquired dynamic 
sentiments, such as the self sentiment, the sentiment to religion, 
and the sentiment pattern of attitudes and interests acquired about 
one’s profession 

In these studies an attitude is defined as a stimulus-response habit 
By this model the strength of an interest, that is, the need to react, 
in regard to any course of action, can be expressed by a specification 
equation, weighting the tension levels of the various drives and 
sentiment systems of the strengths measured m the given individ 
ual I must refer you to my recent booh on Personality and Motion 
tion Structure and Measurement for the postulates, and the chief 
equations, utilizable m the dynamic calculus which develops on this 
basis The non arbitrariness of the drive pattern-makes possible 
unambiguous measurements Tor example, it is found that the 
achievement motive can be resolved into three distinctive compo- 
nents, a drive and two sentiment structures On the basis of such 
unambiguous, functionally distinct and replicatable measurements, 
the experimental investigation of dynamic laws and motivational 
theories can go forward more exactly and more subtly than before 


Factoring of Drives 

A development of crucial importance for climcal theones winch 
these measurements have made possible, is the factoring of drives 
°,1 P ' tcchn,c i ue ' thclr quantitative contribution to the 

CS ’ f a , 1 udc . s ’ symptoms, and conflicts, conscious and uncon 
comW.Il 1° w?u ldUal ClmiCal CaSC Such “ study * "0W be,Dg 
demon of, by W *' fact °™g each of 24 patients, to see the 
flS haserf n re, iu bc , twccn tbc statement of each indmdual con 
d«cr.n psychiatrist's experience with the case, and the 

cSSu THl hC COnD ‘ Ct quantitatively in terms of the dynam.c 
calculus If the agreement is reasonably good, I think we shall have 
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demonstrated a very powerful new clinical tool. Parenthetically, 
I may add that to the extent that the agreement turns out to be 
imperfect, one may reasonably have doubts whether the psychia- 
trist or the dynamic calculus is wrong. In fact, our first step if the 
agreement is inadequate will be to bring in a second psychiatrist 
to see how far he agrees with the first! 


New Types or Tests 

The development of structured measurement in motivation has 
gone hand in hand with the invention of quite new types of objec- 
tive tests, no longer requiring the actual scores on component in- 
terests and attitudes to rest on the verbal opinionnaire or the open- 
ended, projective type of test. These objective motivation measures 
include some devices using the so-called projective principles, to- 
gether with physiological measures of motivation, learning measures, 
and many others. As far as theory is concerned, the interesting 
point of this analysis is that we seemed to get three distinct moti- 
vation strength factors, apparently corresponding to the id, ego, 
and superego contributions in any given interest. 


Practical Implications 

These theoretical implications will doubtless be much scrutinized 
and debated, but there are some immediately dependable conclu- 
sions for the practical man. First, the classical opinionnaire method 
of measuring attitude-interest strength by verbal self-evaluation 
has quite poor validity, accounting for only about a fifth to a tenth 
of the variance in the main motivation factors, however they are 
interpreted. Consequently, generalizations about attitudes and in- 
terests based only on this instrument could be highly fallacious as 
far as the total variance in interest strengths is concerned. Secondly, 
the projective tests, or misperception tests as we prefer to ca em, 
are not clearly distinguished by any factors from the rest ot the 
motivation measurement devices. Thus in the theoretica recon 
struction suggested by this work, the classification o mo iva ion 
measurements would fall principally into id, ego, an superego 
component measures, and the division into projec ive in n 
projective, physiological and non-physiological, etc., signs 
tivation strength become rather pointless. 
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Measurement of States and Traits 


Any comprehensive view of progress in personality assessment 
must include the measurement of states as well as the measurement 
of traits The work of Scheier (11) on the measurement of anxiety 
provides, as I have briefly mentioned, a very neat methodological 
demonstration of conceptual and statistical problems involved in 
separating states and traits Scheier has now checked the anxiety 
state pattern in two independent factor analytic studies, and the 
anxiety trait pattern m no fewer than eight independent researches 
There is enough simdanty m the state and trait patterns to justify 
the popular habit of using the term anxiety for both Both load 
a particular set of markers in the questionnaire realm, in objective 
personality tests, and in physiological response measures, though 
the emphases are interestingly somewhat different For aught the 
early scale makers knew, there might have been four or five distinct 
and uncorrelated factors of anxiety rather than a single factor 
As it turns out there does seem to be a single factor of anxiety, 
but these premature scales mix this anxiety factor with the quite 
distinct ncuroticism factor and a number of other irrelevant and 


contaminating factors It is really not surprising that anyone sur- 
veying the literature of the past ten years is discouraged by almost 
every finding being matcliable by an equal and opposite finding 
for even when investigators verbally defined anxiety in the same 
way, they frequently used a different test for it It is rather early 
to see what the full impact of factor analytic work on anxiety meas- 
urement will be in giving a new momentum to insightful clinical 
research The instrument could permit the emergence of a whole 
scries of new laws and therapeutic certainties, replacing the present 
gropings toward scales of obscure meaning 
One of the certainties which emerged relatively early from this 
joint attack by Eysenck’s laboratory and our own, was that anxiety 
and ncuroticism arc distinct factors They have a slight obliquity, 
ut, as I ha\c shown with substantiating evidence, they can be 
measured with satisfactory reliability by objective tests, and, when 
wi«»^? SUrC * lt iconics evident that a person can stand at any 
ZTtZT whUe ^cupying any position on the other. If 

r ’ k* r ™ 0re tentative evidence by Eysenck’s cowork- 
thrn f S encnd psychoticism dimension is sound, 

of measured p S > chobLm' * r ° tlC1Sm arc ’ add *onally, independent 
These, however, arc only local areas of illumination in the factor 
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analytic picture, rendered clearer by our clinical familiarity with 
the phenomena. Outside these brightly lit spots, in the domains of 
the remaining dozen or more personality factors, definitely locatable 
but uninterpreted, there exists obscurities and some intriguing para* 
doxes, now engrossing the pure researcher. For example, for the 
last four years it has been known that two substantial second order 
factors can be found among the primary personality factors as rep- 
resented in the 1G Personality Factor Questionnaire. 

The first of these second order factors brings together the sep- 
arate dimensions of ego weakness, high crgic tension, and the mys- 
terious O factor, sometimes called guilt proneness, and which we 
have so far hung on to mainly by the symbol O. The second of 
these massive second-order factors reveals the existence of a com- 


mon influence behind surgcncy, cyclothymia, dominance, and the 
factor which we have called parmia, which is short for high para- 
sympathetic system dominance. These patterns were confirmed by 
the independent study of Karson, at the University of New Hamp- 
shire, and I think that we can now agree that the second of these 
two large factors gives substance to the Jungian concept of extra- 
version-introversion as a definite, invariant second order factor, 
rather than as the mere correlation cluster which it was once thought 
to be. That is to say, the general personality dimension of extra- 
version really expresses itself in five relatively independent primary 
factors: surgcncy, dominance, parmia, cyclothymia, and lack of self- 
sufficiency. The quality of an individual’s extraversion therefore 
needs to be defined by his separate scores on these five components. 

Although the second massive second order factor thus quickly 
fitted into a concept long popularly discussed, the first large pattern 
involving ego weakness, ergic tension, etc., as I have desen e , 
could not be immediately interpreted. However, when Scheier e- 
gan his work with objective anxiety measurements, he included me 
16 Personality Factor Questionnaire in his study and when he de- 
termined the loading of his objective test anxiety factor on ese 
questionnaire measurements, the pattern of loadings tume ou 
be exactly the same as that found in the factorization o e 
tionnaire itself. In other words, the second order factor amon the 
questionnaires is identical with the first order factor among 
objective test measurements. On looking at the psy c o °pca p 
ture this begins to make good sense for it tells us g Y 

is contributed to by ego weakness, by high ergic ensi > 
to say, frustration of drive expression, and by the emp 
guilt proneness component. 
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Without time for expanding these comments, I would point out 
that we now have three instances where a second order factor in 
the questionnaire realm has become recognized and confirmed as 
a first order factor m objective, instrumental tests The perplexing 
lack of relationship between the questionnaire and behavior rating 
factors on the one hand, which mutually agree well, and the ob- 
jective test factors on the other, which have previously defied align- 
ment, therefore begins to resolve itself. The objective test factors 
are second order factors to the primaries found in the other media 
of observation This is only one illustration of the increasing inter- 
connection and illumination of structure which is now beginning to 
take place in the factor analytic realm However, I want to add a 
technical word of warning I think tins fitting together of the jigsaw 
puzzle can continue only insofar as we all give far more attention 
to good technical precision in our first order factor analyses than 
has been typical of work in the last ten years In parbcular, far 
greater diligence is necessary in getting accurate rotation, to a 
plateau of maximum percentage of variables in the hyperplane, 
whenever simple structure is alleged to be obtained. 


Avoidance of General Theory 

If I have referred insufficiently to general theories, it is because I 
believe that psychology particularly needs to guard itself at this 
stage of development from getting into cloudy regions of grandiose 
theory, instead of seeking well established laws and concepts, sus- 
ceptible to accurate measurement In a healthy science, wider 
theories arise from well determined regularities, wluch we call laws 
If >ou give yourselves a few seconds thought on this, I think you 
will realize that the unquestionable, dependable laws m psychology 
can probably be counted on your fingers, not in the hundreds with 
which they can be counted in the physical sciences This is both 
a cause and a consequence of the psychologist’s readiness to escape, 
at the drop of a hat, into philosophy The despair which motivates 
us escapism is a justifiable reactive depression to the very small 
™° im . ° f P 1 " 0 ^ 0 ^ made m psychology relative to the enormous 
a n° r tha J ^ S one mto research over the past thirty 
years, especially m the clinical and personality area 

tho 1 'j 6 ? et to ° ^ lssatls fied with ourselves, m relation to 

arxrvce nv a J 1 fi P h >' sici fs’ accomplishments, we can always look 
ko c s 1 more chaotic and barren backyard of our neigh- 

, tl»e sociologists. We are not quite at the head of the list in 
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terms of getting nowhere in a great hurry Several shrewd observers 
have pointed out one feature that constantly seems to distinguish 
research in the social sciences from research m the physical sciences 
tire physical sciences typically show an architectonic growth, in 
terms of one research building constructively upon another, where- 
as in the social sciences there arc an enormous number of unre- 
peated researches, m which particular variables are used by a 
particular investigator and never touched again by anyone The 
resulting scenery is a shanty town of one story hovels instead of 
the skyscrapers which the plysical sciences build 

I think there are three major, and doubtless many minor, reasons 
for this First, we have tried to ape the physical sciences by con- 
centrating on the univariate controlled experimental method, in- 
stead of the multivariate experimental method which is alone truly 
adapted to the far more numerous variables and complex determi 
nation with which we deal Second, our work needs far more 
mathematical discipline than our students lnve been willing to 
acquire Third, there has been insufficient social organization o 
research By this last I mean that we have been inclined to ascribe 
our failures wholly to defective technical methods when frequent y 
they are due to defective coordination of research Better social 
organization can come cither from the organization of teams an 
institutions or tlirough more sensitive conscience and vision in tne 
individual research worker One of the immature features of our 
science seems to be a bizarre teenager sense of honor, w ic 10 
that no individual with claims to creativity could possibly use the 
same variables as any other individual and certain y _no s P 
repheate any extensive experiments previously one e 
what I would call magpie research in which the mvesbgator seems 
attracted for purely emotional reasons by the g 1 er o P , 

variable or piece of apparatus, e g , the psychog van ° > 
prejudice, colored inkblots, sociometnc coun , or w i, roac Jer 

and centers his research on a mere variable without any broader 
theoretical or conceptual framework. 

Solution Organization and Division of Laboh 

I believe a great deal of progress could be | f ,d< ; 
practice indeed, namely, that of putting o ier p p r u]te 

into ones correlation matrix People can contn one to Mdqaih 
different theories about what is happening behind these vanaWes 
but at least if we linked hands on some marker variables we couiu 



60 OBJECTIVE APPROACHES TO PERSONALITY ASSESSMENT 

with comparative certainty begin to relate and debate the theories 
through the mtercorrelation matrices We do tins as a routine pro- 
cedure in our own factor analyses, taking a minimum of two marker 
variables for each well known factor, for example, from the work 
of Eysenck, Guilford, and other previous researches, when we star 
new investigation on the next factor of theoretical interest Factor 
analyses earned out without such markers from the known terra 
firma are stnctly uninterpretable They inhabit a solipsistic universe 
of their own, with no past and very little future, and might as well 
be earned out upon the moon On the other hand, an overlap ox 
variables, must sooner or later, mean an overlap of integration or 
ideas If people want to be productive, they should get their van 
ables together 

Tins brings me to consider an important respect in wluch the de* 
velopment of our own personality assessment researches may be 
considered to lack integration The charge must be admitted that 
factor analysts are so engrossed m establishing the form and nature 
of factors, with statistical elegance, in laboratory measures, that they 
have made quite inadequate effort to show the clinician, the edu- 
cator, the mdustnal psychologist, and others what these factors 
mean in more popular terms, and particularly to interpret them in 
terms with which the general psychological theorist is familiar But 
let us not mistake the principle of division of labor, which is neces 
sary in a lughly specialized world, for any lack of integration, which 
is not. It happens that a rather unusual assemblage of skills, ap- 
paratus, and organized facilities is necessary for the effective ad- 
vance of knowledge through applying factor analysis to establishing 
functional unities in behavior 

One needs, first, research time, resources and subjects enough to 
permit lengthy measurement of a large range of variables, second, 
a research team with talents in the direction of proceeding from 
general theoretical concepts about personality to actual miniature- 
situational objective test designs, third, a sure touch in the finer 
statistical issues m the area of multivariate analysis, and last, an 
electronic computer, well furnished with programs for the principal 
factor extraction and rotation procedures The combination of math 
ematical competence and chnical insight in psychologists, as now 
trained, is far from common, and the other conditions are positiv ely 
rare, so it is not surprising that there are fewer than halt a dozen 
Laboratories in the Lnglish speaking world, and none that I know 
ot outside it, where this basic re*— ic being intensively pursued 
However, although such re 
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panded to the requisite number, there is no need for them to be so 
few as they are. Any large university psychology department should 
be able to organize an effective laboratory in this area. Viewed in 
broader terms of national effort there are unmistakable similarities 
to our backwardness in the area of intercontinental ballistic mis- 
siles. Both have the pattern of insufficient planning of funds and 
facilities for the scale of work required, and the lack of ability to 
bring representatives of different departments together. In our case 
the coordination failure has shown itself especially, until recently, 
in obtaining strong teams combining clinicians and multivariate sta- 
tistical experimenters. 

The second necessary objective in the organization of research 
is the expediting of external cultural validation of these functional 
unities, once they are established and have had good tests set up 
for them in the laboratory. I believe it cannot be too much stressed 
that tliis is a task which cannot effectively be done by the same 
team or organization as had been designed for the basic internal 
validation just described. Instead this is the proper field for the 
vaster group of professional, applied psychologists in clinical, edu- 
cational, and industrial research. There is always a lag between 
the conclusion of laboratory research and its use in the field, an 
one wonders if this lag could not, with a little better cooperation, 
be cut down from ten to five years. We all know the theory ia 
if a man in the backwoods invents a better mousetrap the world 
will, in a few days, make a beaten track to his door, but in an age 
of advertising and vested interests he is more likely to e pai o 


bury the invention. , , • t „ 

Through the momentum of custom alone, and the ego mvo v 
ments of personal prowess with ink blots or Binet, e majon y o 
clinical and educational psychologists are inclined to continue with 
the instruments they were taught to use at college, though instru- 
ments of twice as high a validity may be open to eva U *V . . 

search reports. For example, many clinicians are only 
to realize that the factored questionnaire of today is sometl g q 
different from the ad hoc questionnaires of former years . and there 
is quite a good probability that it would give em current 

diagnoses and prognoses than are obtainable rom ^ 

took. Others, failing to realize the modem demands for research 
specialization which I have just stressed, seem o p supp ly 
factor analysts will not only investigate structure but abc . supply 
them with the clinical validities of such tests, and they sit back 
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and wait, ignoring their own vital role m test development But 
the test construction itself is today a full time and highly specialized 
task The amount of planning, skill, and labor involved in factor- 
izing literally hundreds of variables, checking by replication the 
factor structure in independent studies, and constructing unifactor 
scales from such variables is enormously greater than that involved 
m the older style questionnaires and tests, which most of us could 
make up almost overnight It is, however, true that the factor 
analyst has usually been content to dump Ins finished product before 
the clinician m the journals and to return to his computer and his 
laboratory 

Unfortunately, even the applied psychologist who realizes his 
role in the teamwork of science, has been inclined to look at this 
abstract contrivance with about as much enthusiasm and insight 


as a Bikini native looking at an atom bomb He nghtly fears that 
it is something which will involve radical changes m his mode of 
practice and thinking Often he is inclined to defend himself from 
having to think in objective structural concepts by saying that a 
factor is an artificial mathematical monstrosity which will have no 
potency in lus human chmcal world The result is that though a 
number of well factored tests highly relevant to clinical practice 
have become available over about the last five years, the activity 
which should have led to their external validation has been utterly 
inadequate The important point, however, is that on the few oc 
casions when their external vahdity has been crucially tried, it has 
turned out to be very good 

Turning from practice to basic theory, one notes that these ex 
temal validities are vital to the full interpretation of personality 
factors and structural relations for factors cannot be interpreted 
in the laboratory alone In the case of the 16 P F Test external 
validation has come m rapidly and freely, leading to great strides 
in the interpretation of these factors winch only five years ago had 
little attached to them except the letters of the alphabet-just like 
the nutritional vitamins before chemical analysis and synthesis 
ms, a lough factor C might he tentatively interpreted from it 5 
t esenp i\c ratings and questionnaire responses as ego strength 
rnnrL C f° " e . akn ” s 0,15 hypothesis only received the degree of 
it k \ "* r° n ' i rC '' U >' rcf ! mrt 'd with the ensuing demonstration that 

a " d P“ lt,vel >' correlated with leadership in face to 
l U P S , ,at 11 ls higher in successful than unsuccessful ps)' 
chiatnc technicians, that it correlates kith school achievement 
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among students of the same intelligence level, that it is negatively 
correlated with accident proneness, that it is substantially negatively 
correlated with anxiety proneness, and so on 
Similarly, the finding that high F factor or surgcncy-vs desur- 
gency, is substantially positively correlated with being chosen and 
voted a group leader, that it has one of the principal loadings in the 
second order extraversion factor, that it increases with alcohol, that 
it declines steadily with age from adolescence to middle age, that 
its level is largely a product of environment rather than heredity, 
that it increases significantly under frontal lobotomy and under 
psychotherapy, provided \aluable extension of the original factor 
hypothesis that desurgency is a form of generalized inhibition, as- 
sociated with frontal lobe action and with frequency of punishing, 
repressive past experience This degree of insight into its nature 
could never have been achieved from the direct content of the 
factor, either in ratings or in the questionnaire responses 

Accordingly, the great need in the social organization of research 
at the present moment is a concerted plan for taking all factor 
analytically well established personality source traits and having 
their social validities, their changes with age, their relevance to 
clinical prognoses, their educational predictive value, etc , system- 
atically examined No one clinical, counseling, or other applied 
psychological center can hope to do this alone or for all the factors 
But a planned division of labor, m which certain laboratories or 
clinical centers make systematic studies of the life history of one 
factor and others of another could lead to an enormous increase in 
the practical effectiveness of personality measurement in applied 
psychology m the next Jive or ten years 

In conclusion, 1 hope I have given some convincing reasons why 
the construction of personality measurement scales should be 
wedded to concepts of personality structure, and some evidence that 
the objective structuring of personality has come of age sufficiently 
to make this possible How soon this marriage will be fruitful, in 
terms of major gains in the power and insightfulness of applied 
psychology, depends on how soon teachers of applied psychology 
cease thinking in terms of catalogues of tests and set out to teach 
tests and measurements as an epilogue to courses in personality 
The psychology of structure and growth comes first the tests are 
merely an appendix to such an exposition If after all this discussion, 
you were to ask me why I personally prefer factor scales to other 
scales, eg, simple homogeneous scales, I think I should have to 
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say because the former are psychologically interesting and the latter 
are dull When you are through with a complicated scaling ritual 
you have perhaps at best eased a neurotic compulsion; but with 
factor scales you can have a lot of fun finding how people tick. 
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Differential Validity in Some 
Pattern Analytic Methods 

Louis L. McQuitty 
Michigan State University 


.A. theory of personality structure 
is a starting point for the development of numerical methods for 
the objective assessment of personality. This chapter starts with a 
simple-minded theory concerning the way in which personality is 
structured. It outlines the theory and traces the development of 
a series of pattern analytic methods that have derived logically from 
the theory. The methods can be used to investigate the fruitfulness 
of the theory. 

Most clinical theories accept the concept of syndromes of symp* 
toms. A syndrome of symptoms is a combination of characteristics 
that implies a disease of some kind. If a person is mentally ill, then 
the manifestation of this condition can presumably be described 
as a syndrome of behavioral symptoms. For every disease entity 
there is a unique syndrome of symptoms. There is presumed to be 
a one-to-one correspondence between disease entities on the one 
hand and syndromes of symptoms on the other. The process of 
diagnosis is to discover the syndrome of symptoms portrayed by a 
patient and then assign to him the disease corresponding to that 
syndrome. 

An examination of syndromes of symptoms reveals that many 
symptoms arc common to more than one syndrome. There is not a 
one-to-one correspondence between symptoms and syndromes. In 
60 
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other words, there is not one symptom that is unique to each syn- 
drome, such that if a given symptom is present then a corresponding 
syndrome is known to be present Nearly every symptom can and 
does occur in more than one syndrome Analogously, most every 
symptom can and does occur as a manifestation of more than one 
disease, there is not a one-to-one correspondence between symptoms 
and diseases 

In the field of mental health, symptoms are characteristic re- 
sponses, and syndromes of them are patterns of characteristic re- 
sponses Following this translation still further, mental diseases 
are personality types This approach gives a particular definition 
to the concept of a personality type The personality type is the 
internal property that causes a person to portray a particular pattern 
of responses It is a hypothetical contract, it is assumed in order to 
explain why an individual gives a particular pattern of responses, 
just as a disease is interpreted to be the cause of a particular syn- 
drome of symptoms even when nothing more than the syndrome 
has been observed in the patient 

There is presumed to be a one-to one correspondence between 
personality types and patterns of responses, such that if a given 
pattern of responses is characteristic of a person, it means that the 
individual possesses a particular personality type There is not, 
however, presumed to be a one to one correspondence between 
individual responses and personality types Rather, most individual 
responses are presumed to be characteristic of more than one type, 
different types can cause the same response As a consequence, 
a given response sometimes means one personality type and some- 
times another type, just as a given symptom sometimes means one 
disease and at other times a quite different disease 

Carrying the analogy of personality types to disease entities still 
further, it is helpful to realize that a person can have more than one 
disease at a time Analogously, it is assumed that a person can be 
characterized by more than one personality type In fact, it is 
assumed that he can be characterized by many personality types 
The number of types desirable to attribute to a person will depend 
on the level of abstraction that we wish to achieve in classifying 
people 

The Classification' Problem 

One problem with which we are concerned is the development 
of numerical methods that can start with the symptoms character 
istic of patients and that can be used to classify the patients ol> 
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jectively into meaningful disease categories. An analogous problem 
is to start with responses to the individual items of a test, using 
these to classify the subjects into meaningful personality types. 

The classification problem is complicated by the lack of a one-to- 
one correspondence betwen responses and personality types.^ In • one 
person, the response to a given test item may be determined by 
one personality type but in another person the same response to 
the item may result from a different personality type. For example, 
a medical type of person may respond correctly to a question about 
chemistry because he has learned it in the advanced study of medi- 
cine. A chemical engineering type of person, on the other hand, may 
respond correctly to the same item because he learned it in the 
advanced study of engineering, but the two types of persons would 
possess this identical knowledge in different patterns of other in- 
formation about physiology and mathematics. 

The fact that the correct answer to an item is caused by two 
different personality types means that it has differential validity. In 
the one case, it indicates a medical type, and in the other it indicates 
an engineering type. In order to know which of these two types is 
indicated by a correct answer to the chemical item, we must know 
the answers to other items. If the correct answer to the chemical 
item occurs in a pattern of correct answers about physiology and in- 
correct ones about mathematics we then know that the correct 
chemical answer indicates a medical rather than an engineering 
type. 

Differentiation Versus Discovery of Types 

Because the chemical answer has high validity for indicating 
both^ engineering and medical types, it necessarily has low dis- 
crimination for differentiating between the two types; it would be 
discarded as an item in a test designed to differentiate between the 
two types. Instead we would use the mathematical and physio- 
logical items in differentiating between engineers and doctors. But 
wc arc concerned here with a more difficult problem in objective 
classification. In the example, we assumed that the engineering and 
medical types arc given; wc know who are doctors and who arc 
engineers. Instead of starting with known categories of subjects, 
we wish to start with characteristic responses to selected items and 
classify subjects into categories which arc determined in some ob- 
jective fashion by the responses to the test items. 

or our purpose wc require valid items; the items wc seek must 
be Indicative of types; but wc cannot insist on items that arc in* 
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vanantly valid in the sense that they measure the same thing or 
things for all persons This latter land of item, with invariant 
validity, is the kind desired m most test construction methods, as 
illustrated especially well in factor analysis In factor analysis, an 
item is assumed to measure the same thing for all people This is 
not to say that it measures only one factor, it may in fact assess 
several factors, as indicated by loadings on several factors The 
point is that whatever an item measures it is assumed to do this 
with near equal efficiency for all people of the umverse under study 
In this sense, its degree of validity is invariant, it measures nearly 
equally well the same stuff for all people 

Items of mvariant validity are not the ones with which to start 
in an effort to isolate types, when the types are to be determined 
by the responses to test items themselves. The reason for this is 
that we have assumed there is not a one to one relationship between 
types and responses There is no response which is known to mean 
one type and only one type 

Not being able to use items of high invariant vahdity, we seek 
then the next best thing, vtz, items of high differential validity 
Responses to these items manifest types but they manifest different 
types in different people, this is the sense in which they have 
differential vahdity Even though the items by themselves have 
differential validity, patterns of responses to them are presumed 
to have mvariant vahdity, each pattern of responses to these items 
is presumed to mean one and only one personality type, ^ us 
manifesting a one-to one correspondence between types and pat- 
terns 

Our purpose is to isolate response patterns which have invariant 
vahdity with respect to types In order to do this we first attempt 
to define a type In this effort, there are alternative ways of pro- 
ceeding We could, for example, attempt to define types in such a 
manner that we could recognize their manifestations obsen ationally 
m people We could then observe people and select representative 
types We could study them and attempt to write items to which 
the types would give differential patterns of answers How ever, tiie 
observational isolation of types has not proved particularly iruittul, 


We have used a different initial approach in our efforts to study 
types Eventually we will wish to combine the two approaches, 
usme them in mutually assistant fashions Our approach is to give 
a statistical-hkc definition to both patterns of responses and per- 
sonal^ types We use the statistical definition to enable us to 
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develop the techniques of analysis for isolating patterns and types. 
Then we propose to study both the response patterns and repre- 
sentatives of the types in order to learn more about the character- 
istics of the types. Our first patterns will doubtlessly be incomplete 
and overlapping but nevertheless subject to refinement, elaboration, 
and improvement through repeated application of the methods 
used. 

We first assume that we have a test that contains items with 
differential validity. Considerable time should be given to the 
selection of such items in terms typological theories. We have not 
yet addressed ourselves to this problem, being concerned initially 
with the statistical definition of types and the methods of numerical 
analysis for isolating both patterns and types. 


Some Pattern-Analytic Methods 

Since several pattern-analytic methods have already been de- 
veloped, it will be helpful to review two of them in relation to our 
set of assumptions before outlining our statistical definitions of 
types and the methods of analysis which flow from them. 

Two major kinds of pattern-analytic methods are appropriate to 
two different classes of data, ordered and unordered. The responses 
to the individual items of a test are an illustration of unordered data, 
at least until the responses have been allocated to a scale according 
to some operational definition. After allocation to a scale, responses 
to the items then illustrate ordered data. 


Profile Analysis 

One general pattern-analytic approach is profile analysis; the 
i ems arc first ordered to a scale and responses to them are used to 
* C ^ >C0 ^ > , to sca ^ e * This operation is performed with re- 
spect to several scales so that every subject lias a profile of standing 
on the scales. A further step in the method is to classify the subjects 
!” ?ii Ca « C ^? ri i” accortBn g to some index of similarity among profiles, 
,n a method by Cronbach and Glcser (1). In this ap* 
F * r *' l c m( 'J lnin g of a standing on any one scale is assumed to 
; ,° n .?. t ,e sta nding of the subject on the other scales. This 

K n \ T \ ^ 1 ustra ted with reference to two categories of profiles, 
™ A a " d profiles for category A are all ver^ much 

nlAr as arc also those for category B, but those of A are different 
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from those of B, except for Scale 3 on which the subjects of both 
A and B all have the same standing as shown in Figure 1 

If we assume that the profiles of categories A and B are manifesta- 
tions of Types I and II respectivel), the common standing on Scale 
3 has two different meanings In profile A, it means Type I, but in 
profile B, it means Type II This result shows that the standing lias 
differential validity The items of a scale, however, are usually 
chosen to minimize differential validity In building a scale, we 


Scale I 
Scale 2 
Scale 3 
Scale 4 
Scale 5 



- profiles for individuals of Type I in Category A 

Profiles for individuals of Type U in Category B 

Ficuhe 1 Hypothetical Profiles lUustratmg Differential Validity for the 
Common Standing on Scale 3 


usually attempt to define a unitary trait that is common P ’ 
and then we attempt to select items that measure invariant 

our subjects, we attempt thereby to select items wi g |j t 
validity, bui items must have relatively low • 
to the extent that they have high invariant vabdi y, validity, 

ures one thing well for all subjects, as required y m iffr- r 

It can not then measure more than one thing as reqmred b , differ 
ential validity Efforts to select items with invariant va hd.ty va _ 
building scales necessarily limits the potentia y scales, 

hdity If differential validity is nevertheless found ' 
the result suggests the worthwhileness of searc & particu- 

high differential validity and analyzing them in 
larly designed to discover the manifestation o yp 
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Lubin s Approach 

Not all pattern-analytic methods which have been applied to 
unordered data, viz., the responses to the individual items of a test, 
have been applied in a manner that selects items with high differ- 
ential validity. Some pattern-analytic approaches have been app^ e <j 
instead in ways which maximize invariant validity and thereby tend 
to minimize differential validity. An example is a study by Lubin 
(2) in which he used what I have called an accumulative method of 
pattern analysis (4). 

Lubin s approach involved selecting items in relation to an ex- 
ternal criterion for a pattern-analytic method of scoring. He first 
selected from a group of many items the one item x that was most 
highly related to the external criterion. Next he treated this item 
successively in a pair with every other item until he found the one 
pair of items x and y that had the highest pattern score with the 
criterion. Proceeding in an analogous fashion, he retained items 
x and y and tried them successively with every other item until he 
found the triplet x, tj, and z that had the highest pattern score with 
the external criterion. Thus he selected his items accumulatively 
one item at a time. By selecting the one item with the highest re- 
lationship to the criterion, however, he selected an item with high 
invariant validity. This action not only limited the differential 
validity of the item selected but also the interaction variance it 
can have with other items and consequently the diff erential validity 
of the next item selected. 

In the Lubin approach, the first pair of items must not only 
include the first item (with high invariant validity), but the pair 
itself must have high invariant validity with the criterion, thereby 
uniting the interaction variance that later items can have with those 
already selected. This whole process continues to limit interaction 
variance. Since interaction variance is the essence of differential 
validity, differential validity is continually limited throughout the 
selection process. 

This outline of influence on validity of the Lubin method of 
se ec ing items for pattern-analytic scoring is not to argue that the 
Lubm study should never have been performed. On the contrary, 
it becomes an example of an effective use of the extreme case. If 
pattern-analytic scoring of items of this kind had proved more 
successful than the usual methods of selection and scoring, it would 

lave been a strong endorsement of all items for pattern-analytic 
scoring. Lubm, however, did not find his approach to be superior 



DIFFERENTIAL VALIDITY IN ANALYTIC METHODS 


73 


to the usual methods. We only have evidence that items with 
relatively high invariant validity do not yield unusually promising 
pattern-analytic scores, leaving the possibility that items may still 
be found with differential validity and promising pattern-analytic 
scoring. 


Methods for Selecting Items with Differential Validity 

In our own approaches the methods are appropriate to the selec- 
tion of items with high differential validity, where differential va- 
lidity is defined to mean that an item response is determined by 
different internal constructs in different people. The occurrence of 
the same item response in two different patterns of responses to 
other items is assumed to be tentative evidence that an item re- 
sponse is determined by two different constructs. Thus, if two sub- 
jects both answer a difficult chemical question correctly but the 
first subject also answers many physiological questions correct y 
while failing mathematical items, and the other subject answers 
the mathematical questions correctly while failing the physiologica 
items, we might say that the first subject answered the chemica 
questions correctly because he was a medical type, and the secon 
subject answered correctly because he was an engineering type. 
We would thus be treating the concept of “type” as a postula e , 
internal construct, attributing to it the power of determining pa - 
terns of answers to items. , .. it, 

By seeking item responses with different validity as evi ence y 
their occurrence in different response patterns, we are in ac se 
ing items with high interaction variance. We want item resp 
which have various meanings depending on the com ina 
other item responses with which they occur. 

In developing our approaches, we have assumed a XP 2* 
theories are relatively inadequate; we suspect that iey 
describe the types that will ultimately prove mos rui 
jective, numerical analyses. Not knowing the na ore XP 
hypothesize, we have tried to develop methods tha ' on 

maximally on the concatenation in the data and Y 

assumptions implicit in the method of analysis. , _ 

We have not known the level of abstraction to a Ppy , . £ or a 

tion of types. Consequently, we have develope . j evc ] 
hierarchical classification of response patterns. subiccts arc 

of classification, there is very little abstraction; the subjjctt ^ 
classified into many categories, and ever}' categ ry 
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subjects, all of whom lia\e relatively many common responses As 
the classification proceeds to successively higher and higher levels 
of classification, there is more abstracting, there are fewer categories 
of subjects and every category contains more subjects who have 
fewer common responses In other words, at the lower levels of 
classifications, relatively unique characteristics arc instrumental in 
determining the many types As the classification proceeds, these 
relatively unique characteristics arc disregarded m favor of more 
general ones which are descriptive of larger categories of people 
Thus in the course of the analyses we proceed from the unique 
individual to relatively unique types, then to more and more gener- 
alized types, until at the top level of classification wc may have in 
the extreme case only one type of person and all members with 
only a few common responses Tins approach makes it possible to 
compare the types of the successive levels in terms of such con 
siderations as meaningfulness, statistical significance, reliability, va- 
lidity, and the prediction of criteria, thereby providing insight into 
which types might most fruitfully be regarded as in some sense 
‘ real n 


Definitions or Types 

By the approach just desenbed, we have attempted to give 
maximal influence to the data in determining what is to constitute 
real types Nevertheless, m developing objective numerical pro- 
cedures for the isolation of types, we must give sufficient meaning 
to the concept of type so that statistical operations will flow logically 
from the definition 


Binary Types 

Our first statistical definition was to define a type as a pair of 
persons so chosen that each member of the pan is more like the 
other member than he is like any other person Types of this land 
were ar itranly called species to emphasize by analogy with the 
Linn e an botanical classification system that the types are at the 
lowest level of classification, as shown in Tieure 2 

i!* 11 ar X* a was defined as a pair of species so chosen that 

i f mem er 0 f the pair is more hke the other member than it I s 
i_ ^ Gr Spe , C1 f an ana l°gous fashion, families, orders, 
» * C C * were defined In tins approach, every species contained 

and & encra * our persons, every family eight persons 

£? Us ty m a geometric progression so that the types of each 
successive level are twice as large in number of persons as those 
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of the next lower level. The method which derives from these 
definitions of types is called binary agreement analysis (6). 

The emphasis on a classification by twos in the definition of types 
was not based on a typological theory; it was done, instead, in 
order to simplify the statistical operations and thus facilitate the de- 
velopment of numerical method for classifying people, realizing 
that an approximate method might enhance the development o a 
more sophisticated solution. 



Class 



In developing and applying binary agreement analysfa^p ^ 
lar problem 1 stood out “ a result of 
analyzed; some highest agreement scores were p 
for example, the matrix of Table 1. . , su b?cct B, 

In this matrix subject A lias his highest agree reciprocal 

and B, in turn, has lus highest score with A, 15 but 

agreement score in tho sense that A is not on y g cs arc non - 
B is also highest with A. Some highest agree individuals 

reciprocal, as illustrated in the one mediating between 
I and J; I is highest with J, but J is highest m ■ ^ ana]ysis was 
The definition of types ns used in binary 8 orcs were rc- 
cntirely satisfactory so long as highest ngreci arose. In 

ciprocal, but when they were non-rcciprocal, a problem 
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the above example, if I were classified with ] because it is highest 
with J, then J could not be classified with K with which it is lughest, 
i.e., so long as we continued to classify in pairs exclusively. In 
binary agreement analysis, we solved tliis problem arbitrarily. We 
classified J with K rather than J with I if and only if the score for 
J-K were larger than the one for I-J. Individual I was then clas- 
sified in terms of its second lughest agreement score, at the species 
level, i.e., with L in the matrix of Table 1. In cases such as tills, 
species J-K and I-L usually come together at the genus level to 
solve the problem in a more meaningful manner at the second level 
of classification. 

Generalized Agreement Analysis 

The joint occurrence of reciprocal and non-reciprocal scores led 
naturally to the concept of multiple reciprocity where I, J, K> an( I 
for example, might all be liighest, each with respect to every other 
one; the scores would all be tied and they would be larger than any 
score which any one of the individuals would have with any other 
individual. However, empirical data do not generally reveal this 
condition, yet theories of types argue for it. We assumed that the 
failure of empirical data to reveal the conditions is due to chance 
error in raw data; we therefore developed a method that includes 
a technique for correcting agreement scores for chance errors, called 
generalized agreement analysis (3). 

Once we had introduced the concept of corrected agreement 

TABLE 1 


Agreement Scores Between Individuals Hypothetical Data 
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scores we were able to use a more realistic definition of types in 
the development of statistical methods for isolating them We de- 
fined a species as a category of subjects of such a nature that e\ eiy- 
one in a category is more like what is common to members of the 
category than he is like any one m any other category In this 
approach, a species of two often has a larger corrected agreement 
score when it grows into one of three, because of the greater 
dependability of agreements by three individuals over those in 
two This fact enables individual I of Table 1 to have a higher 
corrected score with J-K than it does with L, thus requiring that I 
be classified initially m a category with J-K with which it has its 
highest corrected agreement score 

Elementary Linkage Analysis 

A criticism of generalized agreement analysis is that it is labo- 
nous However, the fact that it works enables us to propose a 
simple definition of types which can be applied to a matrix of 
scores to yield a rapid, numerical method for isolating types 

The method is called elementary linkage analysis (5) A type is 
defined as a category of people of such a nature that everyone in 
a category is more like someone else in the category than he is i e 
anyone in any other category This definition is applied easily in 
classifying people into categories once one has a matrix of agree- 



ment score (or some other index of likeness) between P e P . t j ie 
sider the matrix of Table 1 again The first step is o un ^ 

highest entry m each column, as has already been one 1 sent 

We then select the highest entry in the entire matrix n ^Vhese 
example it is 115 and mediates between individuals A and * 
t"o individuals are shown in Figure 3 with a dou e a 
mg between them to indicate that they are a reciprocal p 
Tim highest entry in a matrix is always recl £°Jf most hkc them 
Jf to find all individuals who have either A o , p in 

This is done by reading across the rows of mdiuduals 
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Table 1, and thus finding that C, D, and E have A most like them; 
they are classified in Figure 3 with A. F, and G having B most like 
them are classified with B. These new additions to the type, in i- 
viduals C through G are called first cousins because they join 
directly to a member of the reciprocal pair. We then examine the 
rows of the first cousins to see if anyone has these individuals mos 
like them; we find that H has G most like it. No other individua 
has a first cousin most like it, and consequently no other first cousin 
brings another individual into the type. Row H of Table X is ex- 
amined and it is found that there is no individual who has second 
cousin H most like it. Consequently, we have exhausted the first 
type; no other individuals classify into this type. Individuals A 
through H have now all been withdrawn from the matrix of Table 
1 to constitute the first type. The method is then repeated with the 
reduced matrix to isolate the next type, etc., until all individuals 
are classified. The one reciprocal pair in the reduced matrix me- 
diates between individuals J-K which are joined by first cousins I 
and L to complete the second type as shown in Figure 4 and exhaust 
the matrix. In this approach everyone is classified into a category 
so that he is more like some person in the category than he is like 
anyone in any other category. 

Figure 4. Type II. 

When everyone has been classified into a category, the types may 
be called species, arbitrarily. One can then take an index of as- 
sociation between the species and repeat the process to classify the 
species into genera and analogously for higher levels of classified" 

Succcsriue Agreement Analysis 

A defect in elementary linkage analysis is the fact that the initial 
classification for every individual depends on indices of association 
between pairs of people; these are subject to error, and mistakes 
made in classification at the first level might be reflected at later 
levels. ® 

After having developed elementary linkage analysis, our purpose 
was to develop a rapid method which could use large sets of data 
and classify in depth with maximal validity. An electronic computer 
method was developed for this purpose. A type is here defined as 
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a category of people of such a nature that everyone in a category 
is more like what is common to the members of his category than 
he is like what is common to the members of any other category 
of the same size It is called successive agreement analysis to 
indicate that an individual is first classified with the one person 
with whom he has most in common, then with the two persons 
with whom he has most in common, then with the three persons 


he has most in common, etc 

Successive agreement analysis as practiced is not as comprehen- 
sive as the definition just given would seem to imply Not everyone 
is necessarily classified with the one person with whom he has most 
in common, it is done, instead, only for those who have the highest 
agreement scores, these are the most dependable classifications 
Likewise, not everyone is necessarily classified with the pair of per- 
sons with whom he has the most in common, only those that have the 
highest agreement scores with the most dependable pairs are t ms 
classified A similar faJure to classify everyone at the quadrad and 
even the quintad level may occur, but eventually everyone is 
usually best classified at some rather early level and then usua y 
at all subsequent levels „ . . 

Successive agreement analysis starts with a matrix o agreem 
scores such as shown in Table 1 Then die N highest agreement 
scores are selected out, where N is some arbitrary num >e > X 
maximal for computer capacity, and often equal to ie " , « . 

subjects represented m the matrix In some cases, ®very 
will be represented in at least one of the N lug ies agr ^ 
but m other cases some one or more subjec s wi ^ burliest 

others are represented several times Nevertheless *f e N hi^ert 
diadic scores are the ones with which in ivi u ^ JS tQ 

their highest agreement _ al ^ the taadic evrf #m f cveiy 

^r;rlcTa"of 0 Sad.c scores asThmsm m Tab.c 2 The 
N highest triadic scores are then 5 rctedou somc .nd.nduals 
Usually these will include the class* ram o ^ Thc 

who were omitted m ^^^“"^"eh.glmr and higher 

levehTcpn^cnting clasfification 

psychological mcan.ngfuInM^nc charoctcnsllc 0 f success.! eagree- 
It is essential to on an ^ucpt.on winch can lead to 

ment analysis, for it aeptJ 
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errors if it does not hold for the data under analysis Reference to 
Table 2 will help explain the assumption In this table, mdividua 
L is not included in the highest N pairs listed m the left hand 
column It is assumed first that N will eventually join a category 
which contains one of these N pairs Suppose that this occurs at 
the quadradic level It is assumed that individual L will have a 
higher score m this quadrad than he would have m any ot cr 
quadrad if we had started with any other N pairs A similar as- 
sumption is made for all individuals In general, the assumption 
says that the N highest pairs are sufficient m order to realize the ties 
classifications later on 


Some Findings 

Some of these methods have been tried sufficiently to suggest 
hypotheses for further study A first suggestion is that there are 
items whose responses have differential validity, measure differen 
things in different people When the value of these items is assesse 
in terms of mvanant validity, they are found to have rather low 
validity It is possible that nearly all items may have at least some 
differential validity We may have in general underestimated the 
validity of all items by attempting to do this in terms of invanan 
validity exclusively. 


TABLE 2 

Agreement Scores Between Individuals and Pairs of Individuals 
Hypothetical Data 



A 

B 

C 

D 

E 

F 

G 

H 

I 

j 

K 

L 

AC 

X 

_ 

X 







_ 


- 

AE 

X 


_ 



__ 




__ 

— 


AG 

X 

_ 

_ 

_ 







— 


A I 

X 

_ 


_ 



_ 




__ 

— 

— 

AX 

X 

— 

, _ 

_ 

_ 





_ 

X 

■ — 

BD 

— 

x 

_ 



_ 






— 

— 

Br 

— 

x 1 

— 


_ 







— 

— 

BH 

— 

X 

— 

_ 






_ 

— 


HI 

— 

X 

— 









— 

CE 

— 

— 

X 








— 

— 

EG 

— 

— 

— 

_ 








— 

— 

C I 

- 

- 

- 



- 

X 

- 

X 
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Tlie xs exist where the score would mediate between an individual and a V 3 ** 
containing that individual the table is restricted to scores for groups of threo 
OifiLTcnt individuals. 
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When we have compared our types with external criteria, we have 
found those at the lower levels such as species and genera to be 
more predictive than the higher level ones, where the categories 
might be expected to be more reliably determined. This outcome 
suggests that there are many relatively unique types and that large 
samples of subjects are essential in order to isolate them in a de- 
pendable fashion. We thus need pattern-analytic methods which 
can classify several hundred subjects. Electronic computers with 
high storage capacity make this goal possible of realization. 
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ship between item content and what was being measured was 
probably a heritage from achievement testing procedures Alter 
all, achievement tests m subject matter fields such as geograp y, 
arithmetic, history, etc , have been used for centuries while psycho- 
logical tests, as we know them, have been in existence for bu a 
few decades If one wishes to measure achievement in geography, 
for example, one uses a test that asks questions about geograp iy 
That is only common sense and admits no debate Thus it was 
natural to apply the same technique to personality and interest meas 
urement by using items which unmistakably mirrored the area e 
mg measured. If personality adjustment were being assessed, 
items were straightforward m asking whether the subject w ome 
more than most people” or whether he “often had bad dreams a 
night” This approach worked m its fashion It did not produce 
very good measures of personality, but it produced better meas- 
ures than we had before It was obviously a good start m the rig 
direction. 

There were a number of straws m the wind which indicated tha 
the a prion face value of item content might not be worth muc 
m some behavioral areas At first glance, it seems reasonable o 
postulate a close relationship between accident frequency and re- 
action time on the assumption that the slowpoke would be P r£ J™T 
to mishaps As early as 1929, however. Farmer and Chambers (3 / 
reported that correlations between reaction time and number o 
accidents for several occupational groups hovered around zero 
Curiously, m motor vehicle driver testing, reaction time measures 
are often included even now m “dummy” car testing apparatus, 
apparently on the invalid assumption that those who can get their 
feet off the gas pedal and on the brake will have fewer acciden 
It is only fair to observe that this particular response measure may 
be included less for its predictive efficiency than for its sales va ue 
in influencing business executives who hire the testing 


Concern for Face Validity 

While psychologists would not be misled by face validity m such 
situations when they were in the iron grip of a string of zero criterion 
correlations, they sometimes appear to cling to other unsupportc 
beliefs concerning objective personality tests Indeed, they sorn fT 
tunes express such beliefs in print In 1945, for example, hlee 
(17) found it necessary to take Hutt (38) to task for asserting 
structured personality tests were based on the assumption tha 
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the items would have the same meaning for all subjects who took 
the test This is probably still a fairly prevalent misconception, 
however, as Meehl emphatically points out, neither the Minnesota 
Multiphastc Personality Inventory (MMPI) nor Strongs Vocationa 
Interest Blank (VIB) make this assumption 
A tremendous number of articles concerning these two tes 
have been published (over 800 on the MMPI alone) and it 
should be quite clear that from the MMPI or VIB stan 
point, it is not important whether the meaning is the same 
or different for all subjects, nor does it matter whether tne 
subject is bemg truthful or even a good judge of his own e 
havior If a subject responded ‘ true” to an item such as 1 e ° 
oungle in crowds, ’ it is not important whether he really liked crowds 
or not The important thing is what behavioral correlates can c 
empirically identified with such a response If most people 
crowds and most paranoids significantly do not, we have a goo 1 
for measuring paranoia Furthermore, with similar levels ot stat 
heal significance, any content could be used for such an l em 
lt dealt with crowds, horses, or Socrates These last ew s ® , 
may be unnecessarily sounding the alarm long after ie g 
roused and on the alert, yet the point is of such import that e 

Ce !i lve zea * ma y be tolerated . . . . lfpms 

Thus far it has been noted that personality and in er® _ 

do not need to have a prion meamngfulness The mdeed 

have fully demonstrated this, and they are very g°° ein 

Bather paradoxically, however, while in practice a 8 
pnical test rather than a prion item content, Hathaway 
h < 36 ) m the MMPI and Strong (57) m the VIB i paid a g ^ ^ 
of attention to the content of their test items general 

carefully described categories of items which range 
health to psychotic symptoms, and the VIB employed to 
copations, amusements, etc There was an °bvi°u s however, 

validity m content Just why this was so is bar y> tes ts 
1 m , a y Bave been a sardonic deference to wlm us ; , terns 

m [ght expect to find Be that as it may, the VIB used y 

wluch had some clear cut relationship to voca ° * reasonable 

J day work. The MMPI has mostly items with content I ]j3S a 
ace vahdity for personahty measurement, 1Q we » aUC when 
sizeable number of hems winch arc admittedly enigmatic 
scnitmized m this way in d n^acst 

It appears, therefore, that while objective personality t »o of 

l «ts pay ratIlcr careful attention to item content, at 
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the items would have the same meaning for all subjects who took 
the test. This is probably still a fairly prevalent misconception; 
however, as Meehl emphatically points out, neither the Minnesota 
Multiphasic Personality Inventory (MMPI) nor Strongs Vocational 
Interest Blank (VIB) make this assumption. 

A tremendous number of articles concerning these two tests 

iavp Vioon I / onn i-Un \if\yfPT nlnnf 


A tremendous number of articles concerning these two tests 
have been published (over 800 on the MMPI alone) and it 
should be quite clear that from the MMPI or VIB stand- 
point, it is not important whether the meaning is the same 


point, it is not important whether the meaning is tne same 
or different for all subjects, nor does it matter whether the 
subject is being truthful or even a good judge of his ^own be- 
havior. If a subject responded “true” to an item such as “I like to 
onngle in crowds ” it is not important whether he really liked crowds 
or not. The important thing is what behavioral correlates can _ e 
empirically identified with such a response. If most people like 
crowds and most paranoids significantly do not, we have a good item 
or measuring paranoia. Furthermore, with similar levels °* st * w 
tical significance, any content could be used for such an item whetne 
J t dealt with crowds, horses, or Socrates. These last few sentences 
ma y be unnecessarily sounding the alarm long after the guard is 
roused and on the alert; yet the point is of such import that cx- 
ce&rive zeal may be tolerated. _ . . . fOTn<5 

Thus far it has been noted that personality and interest test items 
o not need to have a priori meaningfulness. The MMP an 
have fully demonstrated this, and they are very good tests indeed. 
Bather paradoxically, however, while in practice adhering to em 
Pineal test rather than a priori item content, Hathaway and Mown 
le / < 36 ) in the MMPI and Strong (57) in the VI » P 3 ‘ d * f .Jog 
of attention to the content of thlh test items. The MMPI used 26 
carefully described categories of items which ranged nom g 
health to psychotic svmntoms. and the VIB employed lists ot oc 


described categories of items wmen ij"6" ° c nc _ 

■eaitt, to psychotic symptoms, and the VIB employed hsh 01 t » 
atpations, amusements, etc. There was an obvious cone 
validity in content. Ju ' st why this was so is hard to 
■t may have been a sardonic deference to whatusers 
"W>‘ fpect to Bnd. Be that as it may, the VIB used only «t» 
vnich had some clear-cut relationship to vocations an y c 

a-day work. The MMPI has mostly items with content of r as ^ ^ # 
aco validity for personality measurement; however, i w | icn 
sizeable number of items wliicli arc admittedly emgm. 
senitinized in this way. . ,. nc l interest 

It appears, therefore, that while objective pcrsonalit) • lw0 0 f 
tests pay rather careful attention to item content, at 
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the best and most widely used tests of this type do not depend for 
sconng purposes upon an appropriate o\ fi ‘ h ng/«^ons e being made 
to the test content As an elaboration, Meehls (47, p 300) rem 
has pertinence, “Thus it puzzles us but does not disconcert us when 
this relation cannot be elucidated, the science of behavior being 
in the stage that it is That 1 sometimes tease animals (answered 
False) should occur in a scale measunng symptomatic depressi 
is theoretically mysterious, just as the tendency of certain semz 
phrenic patients to accept position as a determinant m responding 
to the Korschach may be theoretically mysterious 

The present paper is an attempt to offer an explanation w 1 
seeks to dispel a portion of the theoretical mystery to winch ® 
refers and to gather evidence from various sources in an attenrn 
to demonstrate that particular content of objective personah y 
interest tests is unimportant This is not to say that no con e 
whatsoever is essential for personality and interest tests 0 
sort of stimulus pattern is required, of course, however, vi a J 
any content of any sense modality should be suitable and un 
some conditions the content may be so insignificant as hardly 
deserve the name I beheve the available evidence indicates i 
items dealing with jobs, social activities, attitudes, adjustment, e c 
are quite unnecessary for objective personahty, interest, and simi 
tests One can use such content if he so desires, but one can equa y 
well use abstract designs, sounds, lists of foods, lights, imaginary 
questions, spiral after effect, and content of an equally wide range 
In this sense, then, item content is unimportant 


Tiie Deviation Hypothesis 

Before reviewing the evidence for the unimportance of particular 
item content, it seems appropriate to offer some theoretical explan 
ation of why content is not important At present tins can be state 
only as a hypothesis, though one which has been supported at 
number of points b) empirical test Tins is the Deviation Hypothesis 
which has been set forth in several previous publications (12, 

An outline of the hypothesis will serve as a framework for the re- 
mainder of the present paper There arc literally hundreds of stu 
les dealing with isolated empirical demonstrations of the prediction, 
with varying degrees of success, of certain facets of behavior from 
other, ostensibly unrelated facets of behavior The commoncs 
examples of this arc the variety of clinical measurement devices, 
although there are many others The question at hand is how " c 



THE UNIMPORTANCE OF TEST ITEM CONTENT 


87 


can account for the predictive usefulness described in so many of 
these researches What, m other words, is the common thread run- 
ning through such myriad studies which deal with bits and pieces 
of behavior? It is to these questions that the Deviation Hypothesis 
is directed 


The Deviation Hypothesis is based upon biased responses The 
emphasis, however, is not placed upon the bias itself but rather 
upon the departures from an established pattern of bias This latter 
is the important factor, indeed, it is the key to the problem before 
us In a "true-false,” “head-tails,” “agree disagree” response situa- 
tion, for example, the responses rarely follow a normal probability 
distribution where the stimulus pattern is relatively unstructured 
Instead of a 50-50 percentage distribution of responses, one often 
finds 80 20 or some equally skewed pattern indicating bias Cron- 
bach (26, 27) has described a large number of such response sets, 
as he called them, m psychological testing, and other writers (17, 
24, 33, 43, 52) have also provided evidence for the existence of bias 
Tlie Deviation Hypothesis is not directly concerned with the re- 
sponses which contribute to the pile-up, m the 80 per cent who 
call “heads” when a com is flipped, for example Rather the interest 
is centered in the 20 per cent who go counter to the “heads 
response and say “tails” or possibly something else or even say noth- 
ing at alL 

The persons who deviate from the established pattern of bias in 
such insignificant responses as responding tails, dislike, dis- 
agree,” and so on are not merely different in such minor or non- 
cntical aspects of behavior They are also different in critical or 
significant aspects of behavior— or so the Deviation Hypo esis 
would have it The noncntical aspect of behavior is a reflection 
of a critical aspect, the two go hand m hand The critical aspec 
is a personality manifestation It may be aberrant adjustment such 
as schizophrenia or chronic anxiety or it may be some o ler a erran 
condition such as genius, mental retardation, acciden pronenes , 
creativeness, chronic heart disease, or any other con i ion w i 
may be objectively defined on some behavioral dimension I inis 
we would include physicians as well as Ueptomania^, cngmecrs 
as well as scholastic under achievers as being capa e 
objective, operational definition . 

Accordingly, the Deviation Hypothesis has been stated (13, p 
159), “Deviant response patterns tend to be g ei J c ™ » a | )nonna j lty 
deviant behavior patterns which are significan sums) 

(atypicalncss) and thus regarded as symptoms ( * 
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are associated with other deviant response 
noncritical areas of behavior and which are 
toms of personality aberration (nor as indica 
It should be emphasized that abnormality 
literal meaning of “away from normals It 
or lawyers, for example, show certain respo 
them from each other and also from the rest 
Everyone makes many responses which are J _ 

by the majority of people. Everyone also makes certain o er 
sponses which are peculiarly his own or peculiarly shared y 
special group. These certain other responses are, of course, not .s&area 
by a majority of people; hence they are the responses designs e 
as deviant and may be precisely defined statistically in terms o 
level of significance by which they depart from the common re- 
sponse. Insofar as can be ascertained, these deviant responses 
the product, singly or in combination, of past learning, innen 
structure, and organic or physiological state. Thus it may be sm 
that the deviant responses which differentiate engineers from p y 
sicians, for example, are chiefly the product of past learning wher 
organic factors (possibly at times associated with inherited de e J 
are presumed to be the basis for any deviant responses exhi i o 
by patients with chronic heart disease. . 

Very likely, learning pervades to some degree all aspects of d ' 
ant responses, including those responses which are rooted in heredi- 
tary or physiological aberration. Habits of living to take a case 
in point, may produce stress which culminates in cardiac disord 
in the manner described by Selyc (51). In other cases, presumab ) , 
a weakness of heart structure would require new learning, that is 
habits which would avoid taxing the weak heart. According to e 
Deviation Hypothesis, both conditions should produce deviant re- 
sponse patterns, though not necessarily the same pattern. It shot" 
he emphasized that these are but illustrations, not evidence, of n 
learning might be involved in what appears to be essentially an 
organic condition. 


patterns which are in 
not regarded as symp; 
tors, signs, earmarks), 
is to be taken in the 
i this sense psychotics 
nses which distinguish 
: of the general public, 
nuite like those made 


Particular SnMutcs Content is Unimportant 

Thus, what has been said is that no particular content is needed 
for interest and personality tests, nor, for that matter, in a wi e 
variety of other behavioral measures. What is needed are stimun 
that will elicit deviant responses or, more accurately, stimuli wine 
will produce “response sets” or biases from which deviant response 
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patterns may be statistically identified Such stimuli should be rela- 
tively unstructured since lack of structure facilitates the appear- 
ance of bias Accordingly, the hypothesis with respect to item con 
tent has been stated (18, p 160), Stimulus patterns of any sense 
modality may be used to elicit deviant response patterns, thus par- 
ticular stimulus content is unimportant for measuring behaviors in 
terms of the Deviation Hypothesis ” Attention is called to the fact 
that this statement makes no assertion that any item is just as good 
as every other for discriminative purposes Such a claim would be 
patently absurd 

What is meant is that if an item concerned with nightmares, for 
example, distinguishes schizophrenics from normal persons at the 
five per cent level of confidence, one can locate equally valid items 
which are utterly different in content, items with content such as 
mterlaced triangles, musical sounds, autokinetic phenomena, etc 
But if mterlaced triangles are found to be equivalent to nightmares 
as an item, it certainly does not follow that a drawing of interlaced 
circles would be just as good as either, or that some musical sound 
must perforce be as valid as all three While such might be the 
case, empirical demonstration is absolutely necessary to make the 
necessary determination of possible item eqm valence In other 
words, the same procedure that was used to ascertain the value of 
the nightmare item must also be applied to the mterlaced triangle 


item— or any other item 

From what has been said, it should be possible to use a senes 
of abstract designs for test items and do about as good a jo i o 
measunng personahty as can be done with traditional verbal items 
of the “I am troubled with insomnia” vanety Content other than 
abstract designs could, of course, be used, but this examp e wi 
do for a beginning The Perceptual Reaction Test (PRT) (1 J ias 
been used for this purpose since it was developed to elicit set 
It is composed of 60 abstract designs drawn with ruler and compass, 
and the subject checks either Like Much, Like Sltg it tj, is i e 
Slightly, or Dislike Much for each design Only seven minutes, on 
the average, are required by normal subjects to take c es 
would really be a much more effective comparison if the PK1 iiau 
several hundred designs smce most of the personality tests in w i e 
use have hundreds of items Be that as it may, a number o s u ics 
have been completed which indicate that c\ en a mere esigns 
of no particular meaning can do a good job of reflecting ce 


facets of personahty 
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OBJECTIVE APPROACHES TO PERSONALITY 


assessment 


Variety of Usable Stimuli 

Berg and Collier (15) found that groups 
operationally identified by the (hd 

Suggestibility Test made significantly more exttem annety 

much or dishle much) on the PRT when comp y results 

subjects Lew* and Taylor (41) obt “ d ?"g d that the 
with the same test, however, them r findmgs ; demo ^ as Berg 
extreme choices were not preferences fo extreme op- 

and Collier thought, hut were actually Preferences ^ 

tion content in the PRT A much more detailed ^dyoftheO^h 
nostie possibilities of the PRT was P^hed ! WB* ^ ' contro ,° 
1,700 normal persons (1,000 males, and (00 f ) . c j, nic 

Bames adm.mstered the PRT to 546 (360 males IwLm responses, 
and mental hospital patients By identifying i pita, for 

Bames was able to constmct clinical scales as fo o vs ve, 
general NP disturbance, Fsi, for psychotic con&tion | 
schizophrenia, Ch, for character disorder, Pst Cl I, for s £ 

diagnostically psychotic and character disorder state pjyp 

(7, p 290) words, “It is concluded that response set o e ^ 
is related to personality factors, that it has a degree 0 d 

winch compares well with other tests of personality factors, 
that it can be used to assess personality disorder meas ure 

The PRT was also used by Hesterly and Berg (37) to 
maturity in relation to schizophrenia The PRT responses P Pj 
of normal children aged 8, 10, and 12, were compared wU * 
adults, and the younger age groups were found to nave r . 
patterns most different from adults with the difference a ( C d 

for the older age group Since immaturity is commonly as 
with schizophrenia, it was postulated that no signincan , ^ 

would be found for the deviant response patterns of adu 
phremes and normal y oung children This was found to be l 
The youngest groups of normal children were different in - ^ 
response patterns from normal adults but not different from 

schizophrenics rnmean 

Thus it appears that wath a simple test composed of only o ^ 
fngless designs we can measure certain aspects of persona 1 > j 
means of the Deviation Hypothesis m the same way that tbe ^ 
objective personality inventories do when using traditional ' 
content By the same token, it would not be surprising to 1 _ 
relationship between behavior disorders and responses to other ^ 
traditional personahty test content such as a list of foods > 
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(61) and Gough (34), for example, showed that neurotic males in- 
dicated that they disliked significantly more foods than normal 
males. 

In another study, Wallen (62) found a similar, significantly greater 
number of food aversions in various clinical diagnostic categories 
such as intra-cranial injury, anxiety neurosis, hysteria, epilepsy, etc. 
when compared to normal males. In a comparison of number of 
food aversions and scores on a test of adjustment, Altus (4) found a 
correlation of .497 between the two measures for data obtained 
from Army illiterates. Smith, Powell, and Ross (55) found that high- 
anxiety individuals, as identified by the Taylor Manifest Anxiety 
Scale, showed significantly more food aversions than low-anxiety 
subjects. These studies were not done as investigations of the 
Deviation Hypothesis since they antedate publications of this con- 
cept. However, they illustrate an aspect of the unimportance or 
particular item content; and like many other studies concerned with 
critical and noncriticai responses, they can be fitted comfortably 


into the deviant response concept. 

Stimuli for conditioned response, autolanetic and spiral after- 
effect perceptions involve noncriticai areas of behavior in the sense 
that the responses are not regarded in themselves as symptoms or 
earmarks. However, several studies have indicated that deviant 
patterns in noncriticai facets of such behavior are reflections or 
viations in critical areas. Taylor (58) and Spence and Tay or ( ) 
found that anxious subjects, as identified by her Manifest xiety 
Scale, were consistently and significantly superior in all measures 
of eyeblink conditioning and extinction compared to nonamoous 
subjects. Voth (60) tested 845 mental hospital patients and *23 
normal subjects with the autolanetic phenomenon. e omi 
distinctive patterns of deviant responses were charac eris 
tain patient groups. Schizophrenics, epileptic, an anxie y p 
among others, revealed more pronounced apparen move 
compared to normal groups. Manic-depressive and I 
patients either experienced no apparent moveme nobler 

ment was much less extensive than nonnal. Pnce and Deabler 
(49) and Freeman and Josey (31) used ^ ^chunodes spiral 
after-effect as a means of differentiating pa len nifnnal 

brain damage and patients with memory impairmcn .j .;c 

subjects. Aaronson (1) ^ed a similar technique with an epileptic 

population. 
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OBJECTIVE APPROACHES TO 


PERSONALITY ASSESSMENT 


Variations in Response Measures 


Perhaps because of its essentia] role in ordinary communications, 
deviant language behavior has for centuries been recogn 
related to psychological states. The staccato speech of j£. 

tients, the painfully labored utterances characteristic of cert 
pressi’ons, or the neologisms of schizophrenia are cases w> P° 
More recently, subtler aspects of language have been . , 

with respect to their value as indicators of disturbed cm ^ 
states. These studies may be mentioned as additional examp 
bits of research which lend support to the assertion that P 31 ** ^ 
item content is unimportant for personality assessment 
studies can also be fitted into the broad framework of the 
Hypothesis, although they were not intended as tests of that hypo 


None of the studies has scaled the responses in the usual o ] 
tive personality test fashion; however, it seems feasible tlia ^ 
could be done should it be deemed advantageous to do so. 
dorkoff and Mussen (21), for example, found that schizop hr ^^ 
chose inferior definitions on a special vocabulary test vv “ e ”. « t 
pared to normal subjects. Lorenz and Cobb (44, 45) reported 
psychoneurotic patients used more verbs and pronouns but te% 
adjectives and prepositions as compared to a normal control 
As client adjustment improved during a series of therapeutic ^ 
terviews, Berg (14) found a decrease in the ego words, I> ^ 

“my,” and an increase in the empathic words "you,” "we, uS> 
Other studies, such as those by Fairbanks (29), Mann (46), aD 
Roshal (50) among others, indicate that vocabulary variability re- 
lates to personal adjustment Thus what one says and how one sa) 
it are apparently capable of manifesting deviant response P atte ^‘ 
On this basis, then, language behavior could serve for personam} 
assessment purposes. Although any such test might prove to of 
unwieldly and inconvenient, a test could be prepared bom so 
responses judging bom the available evidence. 

The list of studies which employ various forms of content to re- 
fleet various facets of personality would run into the thousands- 
Figure drawings, to mention but one class, have been used in n° 
merous studies to investigate such matters as obesity (40), facial dis- 
figurement (2), and homosexuality (6), on the hypothesis that the 
responses were related to personality. A few studies indicate that 
auditor)’ content may also be validly utilized as measures of p^' 
sonality variables. A long-play record of musical sounds has been 
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developed by Cattell and Anderson (19) which is probably the first 
use of musical excerpts for diagnosing mental illness. Simon and 
others (53) have reported differences in the recognition of mood in 
music for patient populations and for normal subjects. The “tauto- 
phone” (35), a device which emits sounds which resemble spoken 
words but which are actually meaningless, has been employed as 
a personality test. 

No studies have been found in which sense modalities such as 
taste, smell, tactile sensitivity and the like have been used in con- 
nection with personality measurement. One study by Singer and 
Young (54), however, indicates that response bias exists in responses 
to olfactory stimuli; hence it may be possible to use this sense for 
personality assessment. Theoretically, of course, all senses should 
provide content which may be used in this way. In practice, sense 
modalities other than vision are relatively cumbersome to use for 
personality testing purposes. This may account for the vast pre- 
ponderance of personality measures which involve reading or look- 
ing at something. It should, however, be obvious that a wide range 
of content can be used and has been used in appraising personality. 

What is essentially a deviant response technique has been em- 
ployed in studies of physical diseases, some of which have psycho- 
somatic components. Various stimuli patterns have been used, such 
as those found in the Rorschach, TAT, Blacky Test, MMPI, etc. 

A sampling of such studies includes disorders such as uterine dys- 
function (25), peptic ulcer (18), rheumatoid arthritis (22), derma- 
tosis (48), leprosy (42), constipation (5), and others. To what extent 
personality variables influenced or determined the course of these 
diseases or to what extent the diseases produced certain personality 
changes is unknown. The significant point is that, even in such 
cases, a variety of stimulus content can be used to differentiate them 
from normal subjects on the basis of deviant responses. 

Some scales which have emphasized content with direct face 
validity, such as the authoritarian personality F scale (3), have been 
shown actually to measure a considerable amount of simple acqui- 
escence and not only fascist proclivities. This has been shown by 
studies such as those of Bass (10), Cohn (23), Chapman and Camp- 
bell (20). By its use of a wide range of content and its use of atypical 
responses, the MMPI moved in the direction of de-emphasizing 
item content. This is particularly borne out in the use MMPI makes 
of subtle items, that is, items which are quite unrelated in terms of 
face validity to the personality dimensions they measure. 

If it is only the deviant responses that are important, no matter 
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how elicited, and not the test item content, one *ouU be “bk to 
obtam a fairly definite relationship between valid clinical sea 
and a simple count of the number of 0 f 

(8) did tins with the MMPI, by just counting the total number 
atypical responses (X’s on the MMPI record form) , T | ie 

tlie number of atypical responses with the clmical scales^ 
atypical responses, of course, were drawn from all o R s 
scales and no attention whatsoever was given to the content ^ 
found the simple atypical response total correlated 93 wiUi 
Sc scale and 87 with the Pt scale, which is about as high as the 
reliability of these scales Ordinarily, one would expect t 
less impressive relationship, however, the great attention P ' 
the MMPI to clean criterion groups and the use of a wlda 7 f 

of item content probably permitted a more effective expr , 

deviant responses In another study, Bames (9) showed tlia yP 
“true” answers on the MMPI without regard to item con en 
resented the psychotic factor of Wheeler, Little and Lehimr tpv 
atypical “false answers had a heavy loading for the neuro i 


Importance of Deviant Response Patterns 

Thus the important variable is not content per se but rat ^ * 
pattern of deviant responses Such responses can be elicited y 
wide range of stimuh as the evidence presented here lllus r 
Occasionally, the question comes up of whether some devian ^ 
havior pattern in a noncritical area is necessarily symptomatic o 
critical personality deviation Some persons wear strange clo 1 & 
or indulge in weird hair styles, thus they exhibit responses w 1 
arc deviant But, as Ins been pointed out, they may not neccssa y 
be deviant in a critical area of behavior This is true, however, 
must be emphasized that a burnoose on Fifth Avenue or hair pa 
crosswise from ear-to-ear each represent but one deviant respon 
A pattern or senes of deviant responses would be another ma c 
Every normal person exhibits some deviant responses in nonen i 
areas to a variety of stimulus content and lie is still considered 
be normal The crucial point is that people who are deviant in a cn ^ 
cal area of behavior exhibit a significantly larger number of devian 
responses in noncritical areas The pattern of deviant response 
is also distinctive with respect to the critical behavior deviation 
A single deviant response is practically never sufficient to mcas^ 
ure or otherwise identify personality characteristics The same, 
course, Is true of a single item on a personality test of convcntiona 
content A male transvestite who dons female wearing app 1 * 0 
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might ostensibly be regarded as offering an example of a single 
deviant response which is indicative of critical behavior aberration 
Yet actually such preference for women’s clothes is probably, not 
one, but rather a large number of deviant responses That is, the 
transvestite would probably use lipstick and perfume, rouge his 
cheeks, enamel his nails, walk with mmcing gait, separately put 
on a number of articles of female clothmg, etc Each of these should 
be appropnately regarded as separate deviant responses, adding 
up to a respectable total 

Smce everyone exhibits some deviant responses, it seems unlikely 
that a single response will suffice for identifying significant aspects 
of personality Attempts to use a few simple deviant response meas 
tires, such as the handedness studies of Wile (65), Goddard (32), 
and Doll (28), met with very limited success m their investigations 
of the relationship of left hand preference to feeblemindedness and 
neurotic reactions Yet these studies did indicate that hand prefer- 
ence could probably be used as one dewant response item A large 
number of such items might conceivably be scaled into a respectable 
personality test Be that as it may, there is good reason to believe 
that a wide variety of content can be used for personality test items, 
however, as has been the case in the past, a lengthy senes of items 
will be necessary, whatever content may be used 

The evidence reviewed here is believed to indicate that there is 
nothing of special value in particular item content for objective 
personality and similar tests Verbal content of the traditional kind 
used in personality tests is not essential, for a wide variety of content 
may be employed with equal effectiveness Indeed, any con cn 
which produces deviant response patterns will serve, judging trom 
the available evidence The important thing is not particular c° n 
tent, but rather a series of deviant responses and operationally clean 
criterion groups These are the absolute essentials for using deviant 
responses to measure personality Thus it is possi c to use 1 ems o 
traditional content to assess personality however, conditioned re- 
sponses, spiral aftereffect, abstract designs, autokinetic phenom- 
ena, musical sounds, language behavior, drawings an o ter c°n 
may also be used Some test items, of course, will be better cer- 
tain purposes than others, just as some items of convcn Iona 
arc better than others for certain testing purposes u 
the content, valid discriminations of a number of facets of person- 
ally can be made Accordingly for personaht) and similar tests, 
a particular item content is unimportant 
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Social Desirability and 
Personality Test Construction 

Allen L. Edwabds 

The University of Washington 


By an ojbecttve test, I stall 
mean any test for which the method of scoring responses to e 
test materials is rigorously defined. This use of the term objective 
implies nothing about the nature of the test materials design 
to elicit responses. It refers only to the method of scoring 
With an onjcctivc test, the method of scoring can be succinct!) 
described by means of a scoring key. The scoring hey tells us v.’h 3 
to do with each of the responses made by the subject For example 
In a True-False objective test, the scoring hey may tell us that True 
responses to certain items in the test arc to be weighted 1 and False 
responses arc to lie weighted 0 For other items, the True responses 
are to l>c weighted 0 and False responses weighted 1. The subjects 
score on this test liccomcs the sum of the weights assigned to the 
responses he has made. 

Ahltough the definition I base given for an objective test implies 
nothing with respect to the nature of th** test materials, my dis* 
cniriot* of ohi'xtivr personahtv tests will be restricted to those 
traits f«r which the test materials arc verbal statements or item** 
Tt**' tad vet for the subject bv the testing situation is one of self* 
d'-'Trip’l m frrms of th<- statements or items contained in the test. 
O' trsts rf t} :t arr commonly known as personality in* 

vrVo*j»-s 1 rl^hd-s. ©? questionnaires. 
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In the typical personality inventory, the number of possible re- 
sponses available to the subject is fixed by the nature of the test 
so that the subject must choose one of the several alternatives pre- 
sented to him. I shall refer to these various alternatives as response 
categories. An objective personality inventory may have any num- 
ber of response categories. However, we seldom find inventories 
with more than five response categories and, in most cases, the in- 
ventories in current use have only two or three response categories. 
These are usually of the form: True-False, Yes-No, Agree-Disagree, 
Like-Dislike, and so forth. When three response : categories are pro- 
vided, the third category is typically an Undecided category. 

Although it is not a necessary condition for an objective person- 
ality inventory, we generally find that only one of the response 
categories is keyed. By a keyed response, I shall mean the respoKe 
that is assigned a non-zero scoring weight With a Trae-False tes , 
designed to measure a personality variable the keyed response is 
the one that we believe is more likely to be given by those who 
have a greater degree of the variable than by tee * ^ 
lesser degree. For example, in an inventory des igned to neas 
introversion, the following item might appear: I k«p in the back- 
ground on social occasions. If we believe, for one re ... . 

that those who have a high degree of introversion «e more ldrely to 
answer True to this item than those who have a lesser degree of 
introversion, the keyed response would e n * e \. . . those 

It wifi be convenient to confine th ^P7 ese f ° b er of 

objective personality inv f ntorles ’^subLt 3n(1 in which only one 

tiple categories. 


Methods Used in Develop^ Inventoiues 
In constructing objective personality 

veloping their personality invent • 7 . nfonfc, on the 

Personality Inventory and the McKinley (20) and 

other hand, were constructed by Hathaway and McKinley W 
Strong (26) respectively, using a procedure which I stall « 
methcKl of criterion groups. Still a thud approach is the 
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bv Allport, Vernon, and Lmdzey (1) in constructing the Study of 
Values, and which I have also used m developing the Personal 
Preference Schedule (9) This latter approach, I shall refer to as 

the construct approach ,, 

If the factor analytic approach is followed, one starts mitia y 
with a large pool of items Subjects are aslced to respond to eac 
item and the responses to all possible pairs of items are correlate 
The resulting mtercorrelation matrix is factor analyzed in anticipa- 
tion of obtaining a smaller number of factors than items By rota on 
of the factor matrix, simple structure may be found Then, those 
items with high loadings on a given factor and low loadings on a 
other factors are placed together By examining the content or ie 
items with high loadings on a single factor, an attempt is made o 
see what they have in common These items, for the factor ana ys , 
will constitute a scale for measuring a single personality variab e 
Tims Cattell (3, p 81) has found a factor on which a high score in- 
dicates that the subject prefers an art gallery to a game of car s 
on a fine afternoon, that he does not generally succeed in keeping 
Ins emotions under control, that he does not dislike being waite 
on in personal matters, and that he does not believe that racia 
characters have more real influence in shaping the individual an 
the nation than most people believe Yet, at the same time, he does 
admit to fits of dread or anxiety for no apparent reason, lie does 
try to bluff his way past a guard or doorman, and he has been known 
to be a sleepwalker and to talk a great deal in lus sleep Catte 
lias tentatively labelled this personality variable as “Bohemian Un- 
conccmedncss ” 

It is not that a factor analyst would necessarily start out with the 
notion of developing a scale designed to measure 'Bohemian un- 
conccmedncss ” As a matter of fact, it is characteristic of the factor 
analvtic procedure that not one but several scales result from the 
application of factor methodology— one for each of the factors ob- 
tained It is also characteristic of the factor analjst that he has no 
necessary prior notion of what these factors may be, if found, nor 
what they maj relate to The name assigned a factor or scale is 
based upon an examination of the content of those items with high 
loadings on the factor and what these items will be is not necessarily 
knowai m advance Furthermore, the number of factors obtained 
is limited onl) bv the number of items initially used And the initial 
number of items in turn is limited onl) b) the capacity of the 
modem electronic computer 

The criterion group approach demands that wc have two con- 
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trusting groups of subjects available For example, one of these 
groups may consist of individuals labelled as schizophrenic by psy- 
chiatrists and the other of individuals not so labelled These two 
groups arc given a set of items and differences in the responses of 
the tvv o groups to each item arc examined Tests of significance may 
be applied as a basis for selecting those items for which there is 
a statistically significant difference in response between the two 
criterion groups Thus, it may be found that a significantly larger 
number of schizophrenics than normals answer True to the item 
I frequently have pains in my feet Tins item will then be selected 
for inclusion in a scale-along with any other additional items that 
differentiate between the two groups Item selection is rigorous y 
empirical, and the person who uses the entenon group approac 
is, in general, not at all concerned with item content He as 's on y 
that the items included in the scale be those that have been found 
to differentiate between the two groups of interest The name 
assigned to the variable supposedly measured by the scale is based 
on die nature of the criterion groups used in their selection Thus, 
MM PI scales have been constructed to measure schizophrenia, de- 
linquency, depression, hysteria, low back pain, an so 
number of scales which can be constructed fol owing 
group approach is limited only by the number of contrasting groups 

that can be found , . , i__ _ 

If the goal of the criterion group approach is ^develop a sea 
useful in the prediction of membership or lac o me , 

groups comparable to the original criterion groups, p ^ 

followed seem highly appropriate But if score * variation 

developed are treated, as they so often are, as m , § another 
along a single continuum or dimension of persona ' y> J e g ne d 

matter No matter how rigorously the criterion gr P j ’ 
it does not seem at all possible that they can evabe n^ com 
parable m all respects but one It may be p-d** “ 

equate them for such variables as age, sex, soci r vanab ]es 

and so forth, but rt is well known that as the number ofvanab.e 

on which two groups are matched is in “ e “^' t t | ie requirement for 
mg decrease in the number of cases tlia et r ‘ (am s q ubstantI al N s 
membership in the entenon groups If h respect to 

in both groups, then we may have groups c 

the entenon, but this criterion will of necessity 
multiplicity of many tilings been developed to 

Perhaps the reason so many scales “ Ues is because 

measure clinical rather than normal pers y 
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criterion groups for clinical variables are available in hospitals and 
institutions. We should keep in mind, however, that a scale e- 
veloped in this manner can never be better than the criterion group 
which provided the basis for item selection. Thus, if a criterion 
group is established by psychiatric judgment and if psychiatric 
judgment is fallible, as it surely is, this may mean that if we use 
the judgments of other psychiatrists to establish the criterion group, 
it vail not be the same as the original criterion group. This, in turn, 
may result in a different set of items being selected for inclusion 
in the scale than those selected on the basis of responses of me 
original group. Two scales so developed, although supposedly 
predicting the same criterion, may bear little relation to one another. 

In using the construct approach to the development of a per- 
sonality scale, the psychologist starts with at least a vague notion o 
a personality variable that is of interest to him. He may have note , 
for example, that some people seem to desire to be the center o 
attention. They like to entertain others, to tell amusing stories, to 
make themselves conspicuous by wearing unusual clothing, and, m 
various ways, to draw attention to themselves. These isolated bits 
of behavior may be subsumed under a construct which is tentatively 
labelled exhibition. The objective of the construct approach is to 
develop a scale which, it is hoped, will measure the construct or 
interest. When this approach is used in the development of an 
inventory, one does not start with a heterogeneous collection ot 
items. Bather the attempt in the initial stage of item formulation 
is to “map the construct.” The kinds of items we seek are those 
which are believed to be relevant to the construct 

When a sufficient pool of items is available, the responses of an 
unselected group of subjects to the items are then analyzed. Cor- 
relational and factor analytic techniques may be used in the attempt 
to select homogeneous items for inclusion in the scale. Or total 
scores for the subjects may be obtained on the complete set of items 
and the individual items analyzed in terms of total scores. This, of 
course, is, in a sense, a criterion group approach, but with one 
important difference. The criterion groups, in this instance, are 
established on the basis of their behavior with respect to the items, 
rather than in terms of an external criterion. 

A major limitation of the construct approach is that the items 
included in the scale are dependent upon the manner in which the 
investigator has mapped the construct initially. For example, one 
investigator’s construct of exhibition may not be the same as an- 
other s. Tlius, although they may use the same labels, the mapping 
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of the construct may be quite different with the result that the 
items in the two scales may also be different. Further research with 
the scales, along the lines suggested by Peak (25) and by Cronbach 
and Meehl (7), with respect to construct validity may prove of 
value in clarifying the difference between the two constructs 
represented by the two scales. 


Item Endorsement and Social Desirability Scale Value 

Regardless of the approach used in the development of a per- 
sonality scale, there are certain common problems relating to the 
finished product. One of these relates to what I have come to call 
the social desirability variable and it is this problem that I now wish 
to discuss. . f 

You are all familiar with the methods devised by Thurstone tor 
scaling attitude statements. A number of statements re evan o 
some issue or institution are collected and these are su mi ec o 
a judging group. The judging group is not asked to respon 
tenns of whether they agree or disagree with each statement, but 
only to judge the degree of favorableness of each statement on say 
a 9-point scale. On the basis of the distribution of judgments for 
each statement, scale values are obtained by either e j 

equal-appearing intervals or the method of successive m ■ 
The scale valut of a given statement is taken as an mdication of 
its location on a psychological continuum sue a & 
indicate very favorable statements and low values very 

'Twapplied these methods in scaling statements o^krnd 
that we ordinarily find in personality inventories. asked to 

given to the judging group are such that ey a Pa ch state- 
respond in terms of whether they agree or isagree describe 

ment, or in terms of whether they think it does or j social 

them, but rather they are asked to judge ie wor( j s I 

desirability or undesirability of each statemen . ^ consider 

ask them to rate how desirable or undesirable they would ^consul 
the behavior or characteristic in other individuals t ^ed for each 
the distribution of judgments a scale ^ ^btamed^or 
statement by one of the psychological S j oca tion of the 
value of a statement is taken as an indica ^ highly so- 

statement on a psychological continuum ranging from W 

iThese and other methods ot psychology scahng are d«=nbed by Edwards <U 
and Guilford (18). 
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dally undesirable to highly socially desirable High scale values 
indicate statements that are socially desirable and low scale values 
statements that are socially undesirable 

Suppose that we have obtained social desirability scale values 
for a large number of statements The statements are then pnn e 
in the form of a personality inventory A new group of subjects is 
given the inventory and they are ashed to respond to each statemen 
in the usual manner of obtaining self-descnptions For each sta e- 
ment we find the proportion of those responding Yes or True and we 
then plot these proportions against the corresponding, but inde- 
pendently obtained, social desirability scale values The first time 
that I did this I found a linear relationship between the two vari- 
ables The product moment correlation between the proportion 
endorsing an item and the social desirability scale value of the item 
was 87 (8) 

Calvin Wright (28) repeated this study with a minor variation 
He gave 140 items to 127 college students and ashed them to rate 
the degree to which each statement characterized them on a 9 pom 
scale The mean rating assigned to the statements in self-descrip- 
tion was then correlated with the social desirability scale values 
of the statements The product moment correlation between these 
two variables for this sample was 88 

Using a Q sort to obtain self descriptions with the same state- 
ments and with still another sample, I found correlations of 87 be- 
tween mean Q sort rating and social desirability scale value for 50 
females and 84 for the same variables for a group of 50 males (10; 

I have also scaled the items m the Interpersonal Check Ltst for 
social desirability The ICL was then given to another group of 
subjects who were asked to describe themselves without signing 
their names to their test booklets The product moment correlation 
between probability of endorsement and social disirabihty scale 
values for the 128 items in the ICL was 83 (11) 

These findings have subsequently been confirmed by various in- 
\estigators with still other samples of items Kenny (21), for ex- 
ample, scaled 25 personality items, originally used in an investigation 
by Zimmer (29), for social desirability The rank order correlation 
between the social desirability scale value and the proportion en- 
dorsing an item was 82 for these 25 statements Hanley (19), work- 
ing with items from the MMPI, reports correlations of 89 and 9-^ 
between probability of endorsement and social desirability scale 
\alucs for 32 items randomly selected from the Sc or Schizophrenia 
scale, and correlation of 82 and 86 between probability of endorse- 
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ment and social desirability scale value for 25 items selected at 
random from the D or Depression scale 

I believe it is possible to generalize, on the basis of the studies 
described, and to state that, whenever we have a personahty in- 
ventory in which the items in the inventory vary with respect to 
their social desirability scale values, we may expect to find a sub- 
stantial positive correlation between probability of endorsement of 
an item and social desirability scale value of the item Consider 
one possible implication of this finding Although Hanley ( / 
established the relationship between probability of endorsemen 
and items in the Sc scale of the MMPI using only 32 of the 79 items 
in tins scale, his selection of items, he states was random Let us 
assume, therefore, that the relationship would hold for the comp e e 
set of 79 items Now, recall how the Sc scale was developed 'lo be 
included in the Sc scale, an item had to differentiate £ 

between a group of diagnosed schizophrenic patients an a p _ P 
of normal Controls Tim means tint either a ^gn^eandy arg^ 
proportion or a significantly smaller proportion in t le s ° ' P , 
group had to endorse an item compared with the P 8 

proportion in the normal control group But, on the t j ie 

leys findings, we have evidence that the relationship between the 
proportion m the normal group endorsing an item „ orre ] a ti 0 n 
suability scale value is linear with a product mom 
of 92 expressing the degree of the relationship I” nm the 
if an item was to be included in the Sc scale e p P deviate 
schizophrenic group endorsing die item would ie 3 ' , j lt 0 f 

significantly from the linear regression line re a ing P . u _ 
endorsement to social desnability scale value or nr0 | )a bihtv of 
Does this mean that the relationship ^tween probability 

endorsement and social desuabihty scaIe , “ e Another 1S that the 
schizophrenic group? That is one possib y ^ £ or this 

relationship is linear for the schizophrenic group , , t j n0 rmal 
group the regression line is parallel to tha he no 

|roup, differing from it only in terms of the Y intercept ^ 
other possibility is that both the 1 mtercep _ cr j,aps the social 
regression lines differ for the two S r0U P?, , the judgments 

desirability scale values of the ^msesabh^ estabhshed by 
of the schizophrenic group would diff the answere to diese 

judgments of the normal group * d ° * through research We 
questions, hut they could easily be ob b in( j lcatm g that 

do have some evidenoe from a study y mdements of diag- 

social desuabihty scale values based upon tie ju b 
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nosed psychotic patients, are related to the social desirability scale 
values^ the same items obtained from judgments of a non-psy- 
ehotic group. Klett, for example, reports a correlation of .88 
tween die social desirability scale values based upon thejudgments 
of psychotic patients and scale values based upon the judgme 
college students. 


The Social Desirability Hypothesis 

Some years ago, Cronbaeli (5, 6) called attention to the impor- 
tance of response sets as factors influencing scores on psychoiogi 
tests. It was his observation that individuals may show very s * 
response tendencies to items in tests such that these tendencies ar 
relatively independent of item content. The number °f True r 
sponses that a person gives to a set of items may possibly oe 
measure of a general tendency to agree, or, as Cronbach ca e i » 
acquiescence. Similarly, the number of False responses that a person 
gives to a set of items may be a measure of a general ^ ei \ cIen 
to disagree or dissent, that is, to be negative. If an Undecided ca 
gory is provided, the number of such responses may be a measure 
of a general tendency to avoid committing oneself or to be evasive. 

If the majority of the items in an inventory are keyed True, then 
a high score on the inventory may measure not only the variable o 
interest but also the tendency of the subject to acquiesce. Simi ar y» 
a low score may not necessarily indicate a lack of the variable, bu 
only the tendency of the subject to respond negatively. Compara e 
complexities may enter into the interpretation of a score when the 
majority of the items are keyed False. , 

I do not, however, consider the tendency to respond True or the 
tendency to respond False as of primary importance in personality 
inventories. My reason for this belief is that both of these response 
set hypotheses, at least in personality inventories, have been shown 
to lead to predictions which are contrary to fact, whereas an a * 
temative hypothesis leads to predictions that are in accord with 
fact. This alternative hypothesis, I have called the social desirability 
or SD hypothesis (13). 

The SD hypothesis proposes that, just as individual differences 
have been found in the tendencies of subjects to respond True, 
Undecided, or False, regardless of item content, so also are there 
individual differences in the tendencies of subjects to give socially 
desirable responses to items in personality inventories, regardless 
of whether the socially desirable response is True or False. I have 
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devised various scales to measure this tendency and these scales 
arc referred to as Social Desirability scales or SD scales (9, 13). 

An SD scale is relatively easy to develop. Suppose we take any 
heterogeneous set of personality statements and scale them for social 
desirability. We desire items heterogeneous with respect to content 
simply because we do not wish subjects who are to be given e 
developed SD scale to believe we ore measuring some particular 
personality variable, such as, for example, dominance. On t e asis 
of the evidence cited previously, we expect to find a linear re a ion 
ship between probability of endorsement of theseitems an 
social desirability scale values. To develop an SD scale we take 
those items with socially desirable scale values and key the 
response. For those items with socially undesirable scale values 
we key the False response. A persons score on the scale is simp y 
the number of times he has given the keyed response m 
tion, that is, the number of socially desirable responses lie has given 
As I have said earlier, I have developed a number of s “ cI ? S ° sca ’ 
but most of the research that has been done to date is based up 
a scale consisting of 39 items from the MMPI ( )■ ; nvpn f 0rv 
Now let us suppose we take any existing personality inventory 
of interest and examine the scoring key for the items conta m d m 
the inventory. If the trait being measured by the invent o y * 
a socially desirable trait, then we would expect to find am^onty 
of the keyed responses to be socially desira e ' „ sco ring 

key for the trait, in essence, would be much the same as the scoring 
key we would obtain if we keyed the responses as we would i 
developing an SD scale. If the inventory were 

we would expect to find a high and positiveeorrehitionbetween t 

scores resulting from each key. This should, m general ^tme^ 
all personality inventories designed l to “ , f a hjgh 5COre 

themselves considered socially desir . tint is itself con- 

on a given personality inventory indicates a , tra t that n; 
sidered socially undesirable, then ■ the S “™S ^ wouId obtain if 

tt^rsathemTas in an 
th Indie k Ses described, it ^ 

Itemfs; W 1 “available l separate and independently con- 
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structed SD scale, based upon a different set of items, and by cor- 
relating scores on this SD scale with those of a given personality 
inventory, we are no longer correlating two sets of scores necessa y 
dependent by virtue of scoring the same set of items by two sey 
which are not themselves independent. 


Correlations Between the SD Scale and Other Scales 

A person with a high score on the SD scale can be described as 
one who has given a large number of socially desirable responses in 
self-description, whereas a person with a low score can be descn e 
as one who has given relatively few socially desirable responses m 
self-description. If this is a stable and consistent personality c ar 
acteristic, we should find it evidenced in performance on a vane y 
of other personality inventories, regardless of the particular trai s 
supposedly being measured by these inventories. For example, sup 
pose we have an inventory on which a high score indicates a socia y 
desirable trait. Then individuals who are likely to give soci J 
desirable responses in self-description are also likely to obtain ig 
scores on tills inventory. We should expect, therefore, that in- 
dividuals who score high on the SD scale will also score high on 
the trait, whereas individuals with low scores on the SD scale wi 
score low on the trait. As a Tesult, we should find a positive cor- 
relation between scores on the SD scale and on the trait inventory- 
Consider, for example, the Guilford-Martin (15) scales designs 
to measure Cooperativeness, Agreeableness, and Objectivity. Hig 
scores on these three scales indicate traits which, I believe, wou 
be considered favorable or socially desirable. The product-momen 
correlations between scores on these three scales and scores on tn 
original 79-item SD scale for a sample of 106 college males an 
females are .63, .53, and .71, respectively (9). There are three MM* 
scales on which high scores would be considered as indications o 
socially desirable traits: The Dominance, Responsibility, and Status 
scales. The tetrachoric correlations between scores on these scales 
and scores on the 39-itcm SD scale, as reported by Merrill and 
Heathers (24) for a sample of 155 males, are: .49, .52, and .61, respec- 
tively. 

If we consider inventories on which high scores would be taken 
as indications of socially undesirable traits, then, following the same 
line of reasoning as wc did earlier, we should expect to find nega- 
tive correlations between scores on these inventories and scores on 
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the SD scale. 2 Within the MMPI we can find a wide variety of 
scales for which high scores indicate socially undesirable traits. 
Tetrachoric correlations were obtained by Merrill and Heathers (24) 
between scores on these scales and scores on the 39-item SD scale 
for a sample of 155 males. The tetrachoric correlations are as fol- 
lows: 

Correlation with the 39-Item 
MMTl Scales SD Scale 


Ncuroticism 

Dependency 

Hostility 

Manifest Anxiety 
Social Introversion 


For the various clinical scales of the MMPI, Merrill and Heathers 
(24) report the following tetrachoric correlations with the jy-item 
SD scale for the same sample: 


MMPI Scales 
Hs Hypochondriasis 
Pt Psychasthenia 
Sc Schizophrenia 
D Depression 
Pd Psychopathic Deviate 
Hy Hysteria 
Pa Paranoia 
Ma Hypomania 


Correlation with the 39-Item 
SD Scale 
-.52 
-.85 
-.77 
-.61 
-.50 
.08 
-.09 
-.13 


The three lowest correlations with SD are those of .0 , » 
and -.13 with Hysteria, 

cust a omed tVfinXg™ tanrial correlations between SD a^ores 


uu uuier inventories that, when low corre a °ns u etween the 
seek for an explanation in terms of the re a 10 ^ wllic i, 

social desirability scale values of the items an 
the item responses are keyed. 


tne item responses are keyea. 

* According to Berg. (2) deviant set hTOO^siid aso^J “ffa ° rdStt’Xr 
avoided by the maionty of a group of subjects, deviant responses would 

forms of deviancy from normative standards On he m ore or less synony- 

result in low sr-nres On the SD scale, then, deviancy wo d fQ ^ re lated to 


ied by the maionty of a group °' suu ) eL “' , SD sca ] e> deviant responses wouru 
is of deviancy from normative standards O would be more or less synony- 

result m low scores On the SD scale, then, devia y shown to be related to 

mous with "social undesirability” which, in , MMPL 

“abnormality,” as measured by the clinical scales of the MMr 
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structed SD scale, based upon a different set of items, and by cor- 
relating scores on this SD scale with those of a given P ers0 "‘ / 
inventory, we are no longer correlating two sets of scores neces y 
dependent by virtue of scoring the same set of items by v y 
which are not themselves independent 


Correlations Between the SD Scale and Other Scales 

A person with a high score on the SD scale can be described as 
one who has given a large number of socially desirable responses 
self-description, whereas a person with a low score can be descn e 
as one who has given relatively few socially desirable responses 
self description If this is a stable and consistent personality c a 
acteristic, we should find it evidenced m performance on a vane y 
of other personality inventones, regardless of the particular trai s 
supposedly being measured by these mventones For example, sup 
pose we have an inventory on which a high score indicates a socia y 
desirable trait Then individuals who are likely to give s0Cia 7 
desirable responses m self description are also likely to obtain g 
scores on tins inventory We should expect, therefore, that in- 
dividuals who score high on the SD scale will also score high on 
the trait, whereas individuals with low scores on the SD scale wi 
score low on the trait As a result, we should find a positive cor 
relation between scores on the SD scale and on the trait inventory 
Consider, for example, the Guilford-Martin (15) scales designe 
to measure Cooperativeness, Agreeablcness, and Objectivity Hig j 
scores on these three scales indicate traits which, I believe, wou 
be considered favorable or socially desirable The product momen 
correlations between scores on these three scales and scores on 
original 79 item SD scale for a sample of 106 college males an 
females are 63, 53, and 71, respectively (9) There are three MM 
scales on which high scores would be considered as indications o 
sociallj desirable traits The Dominance, Responsibility, and Status 
scales The tetrachonc correlations between scores on these scales 
and scores on the 39 item SD scale, as reported by Merrill an 
Heathers (24) for a sample of 155 males, are 49, 52, and 61, respcc- 
ti\ cly 

If wc consider inventories on which high scores would be taken 
.as indications of socially undesirable traits, then, following the same 
line of reasoning as wc did earlier, wc should expect to find nega- 
tive correlations between scores on these inventories and scores on 
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the SD scale ' Within the MMPI »c can find a wide variety of 
scales for which high scores indicate socnll) undesirable traits 
Tetrachonc correlations were obtained b) Merrill and Heathers (24) 
between scores on these scales and scores on the 39 item SD scale 
for a sample of 155 males The tetrachonc correlations are as iol- 

^° WS Correlation with the 39 Item 

MMPI Scales SD Sca,c 

— 50 
-73 
-75 
-84 
-90 


Ncuroticism 

Dependency 

Hostility 

Manifest Anxiety 
Social Introversion 


Tor the various clinical scales of the MMPI Merrill : 

(24) report the following tetrachonc correlations with the 39 
SD scale for the same sample 


MMPI Scales 

Hs Hypochondriasis 
Pt Psychastheiua 
Sc Schizophrenia 
D Depression 
Pd Psychopathic Deviate 
Hy Hysteria 
Pa Paranoia 
Ma Hypomama 


Correlation with the 39 Item 
SD Scale 
-52 
-85 
-77 
-61 
-50 
08 
-09 
-13 


The three lowest correlations wt M D H mania , respectively 
and - 13 with Hysteria, Parana , SD scaleTJTave become so al 
As a result of my work wl * i atI0ns ! ’between SD and scores 

customed to finding substantial correlations are obtained, I 

on other inventories that when relationship between the 

seek for an explanation m *™<**££ the manner m winch 
social desirability scale va ^^ s ° 
the item responses are keyed 

* According to Berg s (2) devrant set “ ‘ficy £ 'ri JS . 

result m low scores On the SD ^ 4 been shown to be related to 

morn with social undesirability c l^,cal scales of the MMPL 
abnormality as measured by the cumca* 
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Factors Influencing Correlations with SD 

In general, a low correlation between scores on the SD scale and 
scores on another personality inventory could result from at least 
two conditions. We know, for example, that, if the trait being 
measured by an inventory is itself socially undesirable, then, in 
general, most of the keyed responses will, in turn, be socially un- 
desirable. To obtain a high score on the inventory, the subject must, 
in fact, attribute to himself socially undesirable characteristics. Sup- 
pose, however, that it is possible to obtain at least some items such 
that the keyed response is a socially desirable response, yet the 
variable itself is socially undesirable. For example, it might happen 
that, using the method of criterion groups, some of the items in- 
cluded in the clinical scales of the MMPI are such that the keyed 
response is a socially desirable response. Then to these items, a 
socially desirable response would be keyed as indicating a socially 
undesirable variable. If a scale contains a number of such items, 
then this would tend to lower the correlation between scores on 
the scale and scores on the SD scale. 

Some evidence on this point is available. A study by Hanley (19) 
indicates that approximately 75 per cent of the items in the Sc scale 
have socially undesirable scale values, with 25 per cent falling in 
the neutral and socially desirable categories. For the D scale, on 
the other hand, only approximately 52 per cent of the items have 
socially undesirable scale values, whereas 48 per cent have scale 
values falling in the neutral and socially desirable categories. Han- 
ley classified the items in these two scales according to whether the 
True response was keyed or not keyed in determining scores on 
the scales. With the dichotomy, keyed and not keyed, point biserial 
correlations were obtained with the social desirability scale values 
of the items as the continuous variable. For the Sc items, the point 
biserial correlation was .84 and for the D items it was .58. These 
findings indicate that the keying of items in the D scale is somewhat 
less related to the social desirability scale values of the items than 
in the case of the Sc scale. We should, therefore, expect to find, as 
we have consistently found, that scores on the D scale correlate 
lower with scores on the SD scale than do those on the Sc scale. 

There is another possible way in which we might obtain a low 
correlation between scores on the SD scale and scores on another 
personality inventory. We might, for example, have an inventory 
in which a substantial number of the items nave social desirability 
scale values in the central section of the psychological continuum. 
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That is, these items may be relatively neutral with respect to their 
social desirability scale values. If we have a number of items with 
neutral scale values, a subject whose responses are primarily in- 
fluenced by social desirability considerations will be in a quandary 
as to how he should respond. If the scale value of an item is truly 
neutral, then there is no socially desirable or undesirable response 
that can be made by a subject in answering it In this situation, 
'vc might argue, his responses arc more likely to be influenced by 
the content of the item. The correlation between SD and scores 
on the inventory should tlius decrease as the number of neutral 
items in the inventory is increased. 

Subtle Items 

Some years ago, Wiener (27) attempted to classify the items in 
the various MMPI scales into two groups, one of which he called 
subtle and the other obvious. For five of the MMPI scales he was 
able to find two such groups of items. The three scales, Hysteria, 
Paranoia, and Hypomania, for which low correlations with SD are 
reported by Merrill and Heathers (24), are among the five. The two 
additional scales are the D and Pd scales. Hanley (19) has suggested 
that subtle items are those with neutral social desirability scale 
values. I have expressed the opinion that not only may a neutral 
item be a subtle item, but that any item for which a socially desir- 
able response is keyed as a sign of socially undesirable trait would 
be a subtle item (13). In the case of socially desirable traits, a subtle 
item would be one for which the socially undesirable response is 
keyed. Recall that I define socially desirable and undesirable re- 
sponses on the basis of an item's social desirability scale value. 

Let us accept this hypothesis concerning subtle items, for the 
moment, and see if we can predict what we should find when we 
correlate scores on the SD scale with those on the subtle and obvious 
scales of the SD scale with those on the subtle and obvious scales 
of the MMPI. For the obvious scales, we should have more items 
for which the keyed response is a socially undesirable response 
than in the case of the subtle scales. The subtle scales, on the other 
hand, should contain more neutral items and/or more items or 
'which the keyed response is a socially desirable response than m m 
case of the obvious scales. If this argument is sound, then we s 
find a substantial negative correlation between the SD scale an 
obvious scales. For the subtle scales, the correlations xntn ou 
should definitely be lower, with the magnitude and sign o 1 
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relation depending solely upon how many neutral items the scale 
contains an P d upon the number of items for which tlie socnaUy d - 
sirable response is be) eel If we have many items for which U e 
socially desirable response is keyed, the correlation with SD should 

C A t°my ' suggeftio n , Fordyce and Rozynko (14) drew a sample of 
50 MMPI relords from the files of a VA liospita and obtained 
product moment correlations between scores on tlic 39-ite 
scale and total scores on the D, Pd, Pa, Ma, and Hy scales Y 
then calculated the correlations between the SD scale and 
separate subtle and obvious scales The results are as shown 


MMPI Scale 
D 
Pd 
Pa 
Ma 
Hy 


Correlations with the 39 Item SD Scale 


Total 

-69 

-67 

-52 

-03 

-.28 


Obvious 

-78 

-85 

-72 

-53 

-71 


Subtle 

JQ3 

27 

06 

40 

J54 


Note that in every instance the negative correlation of SI wi 
the obvious scale is greater than it is with the total scale cop sls 
of both subtle and obvious items This is as it should be The cor 
relations between SD and the subtle scales, on the other hand, are 
all positive m sign These findings support the contention that 
subtle scales contain neutral items and/or items for which a socia y 
desirable response is keyed to a greater extent than do the 
scales The fact that the positive correlations between SD and e 
subtle scales are not of the same magnitude as the negative cor 
relations between SD and the obvious scales indicates that the su 
scales contain fewer keyed socially desirable responses than e 
obvious scales contain keyed socially undesirable responses 

I have spent considerable time with the MMPI and social de 
suability This is not because I believe the MMPI to be the only 
personality inventory in which the social desirability variable oper- 
ates It is rather because the MMPI is such a widely used per- 
sonality inventory that the data in which I was interested were 
readily available The points I have made would apply, I believe, 
equally well to any other inventory of the True Talse land. 
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Minimizing Social Desirability 

If we do not desire scores on objective personality inventories to 
be influenced by the social desirability vanable, what can we do 
about it? One suggestion is that we can attempt to correct for 
social desirability by means of such scales as the SD scale For 
example, if we know the correlation between the SD scale and 
scores on another personality inventory, then we can predict the 
score that a person would receive on the inventory by means of a 
linear regression function of these scores on the SD score If we 
then subtract the predicted score from the actual score, it can 
readily be shown that these deviation scores will be uncorrelated 
with the SD scores Unfortunately, however, the correlations be- 
tween SD and scores on various personality inventories are of such 
magnitude that the residuals or deviation scores may represent little 
more than error variance It is well known that the reliability of 
difference scores is, in general, considerably lower than the separate 
measures involved in the difference scores 

Another possibility would be to search for items that are relatively 
neutral with respect to their social desirability scale values I do 
not know whether tins is a hopeless search or not I can only say 
that, on the basis of my experience m scaling personality items, the 
number of items with relatively neutral scale values is much smaller 
than tlie number I find with socially desirable or socially undesirable 
scale values 

Along the same lines, we might seek items such that the socially 
desirable response is the keyed response in scales designed to meas 
ure socially undesirable variables For scales designed to measure 
socially desirable variables, we would, of course, attempt to find 
items for which the socially undesirable response is keyed The five 
subtle scales of the MMPI are perhaps the closest approximations 
we have, at the present time, to scales of this kind Additional re- 
search directed toward the development of subtle scales designed to 
measure normal personality variables is needed 

A third approach to the minimization of social desirability in 
personality inventories is the one I have used in developing the 
Personal Preference Schedule In tins inventory, an attempt is made 
to minimize the operation of the social desirability variable by pair 
mg statements representing different personality variables on the 
basis of their social desirability scale values in such a way that the 
social desirability scale values of the two statements are comparable 
The subject is then asked to choose between the two statements. 
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In this way, we hope to minimize the probability of the choice being 
determined by social desirability considerations alone. I shall not 
take time to cite the considerable evidence available which bear s 
upon the extent to which this forced-choice type of inventory does 
minimize the social desirability variable. It is cited in detail in my 
book on the social desirability variable in personality assessment 
(13). 

Status of the Personality Inventory 

And now— what of the future of the objective personality inven- 
tory? Let us first go back to the past. In 1945, Komhauser (23) 
published the results of a survey in which he queried specialists 
about their satisfaction with various psychological tests. One of 
the questions in the survey had to do with their satisfaction with 
existing personality inventories and also with the Rorschach. The 
results were more or less a tie, with 51 per cent expressing some 
degree of satisfaction with personality inventories and 49 per cent 
with the Rorschach. 

I do not have the results of a comparable survey for the year 
1958 rather than 1945. I do not think I would be overstating the 
case, however, if I said that probably all of us who have attempted 
to develop objective personality inventories are not overly satisfied 
with the results of our efforts. There is much that remains to be ' 
done in the way of research before we will have personality in- 
ventories that are judged as satisfactory as, let us say, achievement 
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Ever since L K Frank’s first use 
of the term “projective method” in 1939 (15), there has been a rapid 
mushrooming of techniques for encouraging an individual to reveal 
aspects of his personahty by the way m which he perceives, organ- 
izes, or relates to potentially affect laden, ambiguous stimuli Stem- 
ming largely from psychoanalytic tlieory, such projective techniques 
range all the way from free association m relatively unstructured 
situations to rather highly structured, formalized devices such as 
the Thematic Apperception Test Before considering the problems 
of quantification and objective scoring, it might be instructive to 
examine closely the assumptions implicit in the projective method 
as contrasted to those underlymg psychometric tests and measure- 
ment theoiy 


PnojEcnvE Compared with Psychometric Methods 

Unlike the standardized aptitude test, the projective approach 
deals with the idiomatic expression of the individual as revealed in 
the context of his needs, fears strivings, and ego defensive behavior 
As Frank has so aptly stated, “The essential feature of a projective 
technique is that it evokes from the subject what is, in various ways, 
expressive of his private world and personahty process ” (16, p 47) 
119 
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Given any projective technique where the subject is offered a wide 
latitude Mich to reveal* himself the partied^ sample of Re- 
sponses obtained is assumed to reflect significant aspects “ 
subject’s personality organization, if only the exammer can fi 

^Macfml'aneTnd 1 Tuddenham have pointed out that such an iso- 
morphic assumption concerning the subjects test pro oc . \ 

personality leads to three corollaries that are rarely explic . ( ) 
belief that a protocol is a sufficiently extensive samplmg of the 
subject’s personality to warrant formulating judgments ab > 
(b) belief that the psychological detemunants of each and e xy 
response are basic and general; and (c) belief that projcebve 
tap the durable essence of personality equally m different indivm 
uals (27, p. 34). Many of the more wary, sophisticated projectivisrs 
would argue that none of these three assumptions necessarily to- 
lows from the basic assumption underlying the projective methoa 
that even the best of projective test protocols is but a tiny iragmcn 
of the total personality, fraught with innumerable possibilities i 
misinterpretation. Nevertheless, in actual practice it is difficult 
avoid falling into the dogmatic position of over-interpretation 
an attempt to weave together a consistent picture of the personan y 
dynamics presumably reflected by the clinical techniques employe • 
It can be argued that elaborate, clinical interpretations of persona - 
ity from projective protocols often reveal more about the persona i y 
of the clinician than that of the subject. _ . , i 

In contrast to a projective technique, a psychometric test is base 
upon the fundamental assumption that an obtained score^ on e 
test reflects a hypothetical “true” score which is characteristic or t e 
attribute in question for a given individual under specified 
conditions and at a given moment in time. Any deviation or ® 
obtained score from the true score represents error of measuremen 
which can be assessed provided one is willing to make certain as 
sumptions about the nature of such errors. By defining the true 
score so that it includes all constant errors of measurement, tne 
discrepancy between obtained and true score becomes a random 
error component. Since a random event by definition is uncorrelate 
with any other event, a general theory of measurement can be de- 
veloped out of which components of error variance can be esti- 
mated, both with regard to the concept of reliability and the concept 
of validity (18). 

Contrary to the opinion of some writers (37), such psychometric 
theory is not necessarily limited to a nomothetic universe where one 
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is interested in group or inter-individual differences As Cattell (6) 
has been quick to point out, one can legitimately utilize psycho- 
metric theory for ldiographic purposes by considering k different 
measures on m different occasions for a single person Nor need 
psychometric theory be restricted to consideration of one response 
variable at a time— the oft heard criticism that a psychometnc, 
statistical, or quantitative approach is too atomistic to provide more 
than a ridiculous caricature of the individual personality While it 
is true that most contemporary uses of test scores deal with isolated 
traits, or at best with linear combinations of several traits, the advent 
of configural scoring methods (30), the possibilities of profile analysis 
(19), and other complex, multivariate procedures open new vistas 
for effective utilization of psychometnc theory in the study of the 
individual personality 

Use of psychometnc theory as a basis for assessment of personal- 
ity commits one to a trait theory of personality Postulating some 
sort of “true** score as a hypothetical construct to be inferred from 
observed scores is tantamount to saying that John Doe has X amount 
of tile trait m question It is not necessary, however, to think of 
John’s possession of the trait as a “fixed ’ quantity An individual s 
true score remains invariant only so long as the specific testing con 
ditions remain constant and there is no real change in the individual 
with respect to the trait m question A primary purpose of test 
standardization is to minimize constant sources of error that are 
ordinarily confounded with the inferred true score Only errors of 
measurement that are random m nature can be adequately assessed 
and taken into account by the usual concepts of reliability and va 
hdity within contemporary psychometric theory 

Rosenzweig (37) has observed that assessment procedures can be 
ordered on a continuum dependmg upon the degree of structuring 
and control introduced by the assessor At one extreme are the 
completely qualitative, unstructured methods of psychoanalysis, 
free association by a patient in the presence of an analyst At the 
other extreme are highly structured paper and pencil tests which 
meet all the standards of psychometric theory Projective tech- 
niques are seen as f allin g somewhere in between the particular 
position on the continuum dependmg upon the degree of standard 
ization and control In most instances, the projectivist has tried to 
preserve the qualitative, ldiographic essence of the projective 
method while also searchmg for ways m which to categorize, quan 
tify, and standardize the response variables underlying test be 
havior He would like to have a technique for assessing personality 
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which covers a wide band of the above continuum with a high 
degree of power throughout the range. Very few psychologists in- 
deed have completely and consistently refrained from some form 
of abstraction later leading to quantification. 

As soon as an individual decides to classify and enumerate any 
characteristics of a subject’s responses to a projective technique, 
however crude and elementary the system, he has shifted from a 
purely projective point of view to a psychometric frame of reference. 
Such measurement may be quite nominal and only faintly resemb e 
full-blown quantification. Nevertheless he has made the first an 
most significant step by classification of responses. For example, o 
classify a given response to an inkblot as a W assigns meaning to 
the response that transcends the idiosyncratic, private world of the 
subject. Unless one considers such symbols as W, D , and a , mere 
short-hand devices that have no real meaning beyond calling ones 
attention to certain aspects of the protocol, the symbols take on 
nominal characteristics of measurement. Those subjects who use the 
whole inkblot are seen as one class of individuals (W-tendency type;, 
while those who use only a small part of the inkblot for their re- 
sponse are seen as another class (d-tendency type). 

Such symbols of classification can be considered “signs” depicting 
specified characteristics abstracted from the raw protocol. More 
or less elaborate patterns of signs can be derived, either rationally 
or empirically, which point toward a syndrome or personality at- 
tribute to be inferred from the protocol. The pattern of signs may 
be complex and highly conditional so that predictive state- 
ments of the “if A and B but not C, then X” type can be formulated. 
Or the set of admissible signs may all contribute to some sort or 
“global” measure like the adjustment score derived from the 
Rorschach by Munroe’s Inspection Technique (32). Such clusters 
of signs may have some pragmatic value in predicting a criterion, 
but they have a disjunctive quality or arbitrariness which makes 
any theoretical interpretation exceedingly difficult. In most in- 
stances when a series of responses is classified, some types of re- 
sponse will appear more than once. Counting of such response 
frequencies is the first step in the construction of a quantitative 
scoring system. A Rorschach protocol with 10 movement responses 
would be thought of as indicating a greater tendency to see move- 
ment than a record with only two movement responses. Such a 
statement implies a crude kind of ordinal scale by which people 
can be ordered according to their degrees of Af-tendency, provided 
the total number of responses is controlled. 
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As one becomes engrossed with the counting of symbols it is 
very easy to forget the nature of the projective material being 
classified In lus eagerness to make a given technique meet the 
demands of both psychometric and projective theory, the psychol- 
ogist often compromises the two sets of conflicting standards to the 
point where the technique fads to accomplish either aim There 
are some projective devices that should always be treated by qual- 
itative methods of analysis since almost any attempt to abstract 
quantitative scores will fail to have any meaning Other projective 
techniques may be altered sufficiently to yield scores meeting ac 
ccptable psychometric standards while at the same time preserving 
the projective nature of the task It is too much to expect a tech 
mque designed originally as a purely projective method to lend 
itself to a meaningful kind of quantification without some revision, 
and in many projective techniques no amount of revision will pro- 
duce adequate scores m the true psychometric sense 

Frank (16) has divided the projective techniques into five general 
lands constructive, interpretive, constitutive, cathartic, and refrac- 
tive The constructive methods consist of those techniques which 
require the subject to arrange materials into larger configurations 
or to produce drawings as in the Draw A Person Test The inter 
pretive methods are primarily verbal associational techniques such 
as the Thematic Apperception Test The best known example of a 
constitutive method is the Rorschach m which the subject must or- 
ganize relatively amorphous, unstructured inkblots into meaningful 
concepts While most projective techniques may stimulate cathartic 
reactions, some, such as play therapy with dolls, are designed spe 
cifically for this purpose The last of Franks classes, the refractive 
method, is based upon the fact that any conventionalized mode of 
communication— handwriting gestures, and other forms of expres- 
sive movement— may be used as an approach to the individ uahty 
of a person 

The above classification serves as a convenient basis for a more 
detailed discussion of scoring problems and quantifications in the 
analysis of projective techniques Since cathartic methods cut across 
the other procedures, and since the analysis of expressive movement 
and individual style of communication can be considered as a special 
topic apart from more conventional projective methods, only the 
first three of Frank’s classes will be discussed Considerably more 
attention will be given to the Rorschach and related techniques 
than to the constructive or interpretive methods, partly because the 
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winch covers a wide band of the above continuum with a high 
degree of power throughout the range Very few psychologists in- 
deed have completely and consistently refrained from some form 
of abstraction later leading to quantification 

As soon as an individual decides to classify and enumerate any 
characteristics of a subject’s responses to a projective technique, 
however crude and elementary the system, he has shifted from a 
purely projective point of view to a psychometric frame of reference 
Such measurement may be quite nominal and only faintly resemble 
full blown quantification Nevertheless he Ins made the first an 
most significant step by classification of responses For example, o 
classify a given response to an inkblot as a W assigns meaning to 
the response that transcends the idiosyncratic, private world of the 
subject Unless one considers such symbols as W, D, and a, mere 
short-hand devices that have no real meaning be) ond calling one s 
attention to certain aspects of the protocol, the symbols take on 
nominal characteristics of measurement Those subjects who use the 
whole inkblot are seen as one class of individuals (W-tendency type], 
while those who use only a small part of the inkblot for their re- 
sponse are seen as another class (d-tendency type) 

Such symbols of classification can be considered "signs” depicting 
specified characteristics abstracted from the raw protocol More 
or less elaborate patterns of signs can be derived, either rationally 
or empirically, which point toward a syndrome or personality at- 
tribute to be inferred from the protocol. The pattern of signs may 
be complex and highly conditional so that predictive state- 
ments of the "if A and B but not C, then X” type can be formulated 
Or the set of admissible signs may all contribute to some sort o 
‘global” measure like the adjustment score derived from the 
Rorschach by Munroe’s Inspection Technique (32) Such clusters 
of signs may have some pragmatic value in predictmg a criterion, 
but they have a disjunctive quality or arbitrariness which makes 
any theoretical interpretation exceedmgly difficult In most in- 
stances when a series of responses is classified, some types of rc~ 
sponse will appear more than once Counting of such response 
frequencies is the first step m the construction of a quantitative 
scoring system A Rorschach protocol with 10 movement responses 
would be thought of as indicating a greater tendency to see move- 
ment than a record with only two movement responses Such a 
statement implies a crude land of ordinal scale by which P e0 P® 
can be ordered according to their degrees of M-tendency, provided 
the total number of responses is controlled. 
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As one becomes engrossed with the counting of symbols it is 
very easy to forget the nature of the projective material being 
classified In his eagerness to make a given technique meet the 
demands of both psychometric and projective theory, the psychol- 
ogist often compromises the two sets of conflicting standards to the 
point where the technique fails to accomplish either aim There 
are some projective devices that should always be treated by qual- 
itative methods of analysis since almost any attempt to abstract 
quantitative scores wall fail to have any meaning Other projective 
techniques may be altered sufficiently to yield scores meeting ac- 
ceptable psychometric standards while at the same time preserving 
the projective nature of the task It is too much to expect a tech- 
nique designed originally as a purely projective method to lend 
itself to a meaningful land of quantification without some revision, 
and in many projective techniques no amount of revision will pro- 
duce adequate scores in the true psychometric sense 

Frank (16) has divided the projective techniques into five general 
kinds constructive, interpretive, constitutive, cathartic, and refrac- 
tive The constructive methods consist of those techniques winch 
require the subject to arrange materials into larger configurations 
or to produce drawings as in the Draw A Person Test The inter- 
pretive methods are primarily verbal associational tecluuques such 
as the Thematic Apperception Test The best known example of a 
constitutive method is the Rorschach in which the subject must or- 
ganize relatively amorphous, unstructured inkblots into meaningful 
concepts While most projective techniques may stimulate cathartic 
reactions, some, such as play therapy with dolls, are designed spe 
cifically for this purpose The last of Frank’s classes, the refractive 
method, is based upon the fact that any conventionalized mode of 
communication— handwriting, gestures, and other forms of expres 
sive movement— may be used as an approach to the individuality 
of a person 

The above classification serves as a convement basis for a more 
detailed discussion of scoring problems and quantifications m the 
analysis of projective techniques Since cathartic methods cut across 
the other procedures, and since the analysis of expressive movement 
and individual style of communication can be considered as a special 
topic apart from more conventional projective methods, only the 
first three of Frank’s classes will be discussed Considerably more 
attention will be given to the Rorschach and related techniques 
than to the constructive or interpretive methods, partly because the 
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Rorschach has been studied longer and more exhaustively than any 
other projective test and partly because it provides an unusually 
good illustration of various problems oE quantification encountered 
throughout the projective-psychometric continuum. 

CoNSTnucmT Methods 

The way in which a child or adult arranges miniature life toys, 
draws a figure of a man or woman, or builds mosaics from colored 
pieces can reveal a great deal about his personality. Generally 
speaking, however, such creative productions are very difficult to 
analyze in any objective, quantitative fashion. Most clinicians only 
use qualitative procedures when dealing with constructive methods. 
Occasionally the characteristics of a construction may be classified 
to formalize its description, but inferences regarding personality, 
whether based upon symbolic interpretations or more direct ex- 
pressions by the subject, remain at the clinical intuitive level. Of 
course, rating scales for recording clinical judgment can be em- 
ployed with such materials, as with any other individual response 
or style of expression. But it is not difficult to see why quantifica- 
tion in the psychometric sense has failed to prove useful in the 
analyses of drawings or other creative products, even though the 
situation may be rather highly structured as in the Bender-Gestalt 
Test. Usually the construction has to be viewed as a whole or as 
only a very small number of separate units analogous to test items. 
The configuration, color, shading, and other characteristics of a 
drawing are complex, defying quantification in the usual sense. 
Nevertheless, in some special cases, fairly successful attempts have 
been made to score objectively certain limited aspects of such 
productions. Several of these will be briefly discussed. 

Drawing a human figure has been employed rather extensively 
as a projective technique in recent years, largely due to the per- 
sistent studies of Karen Machover (28). Working primarily from a 
psychoanalytic point of view in which the drawing is assumed to 
reflect the body image of self, Machover and others have developed 
systems of graphic analysis utilizing a sign approach to the scoring 
of drawings. For full use of the system, the subject must draw both 
a man and a woman so that comparisons of self-sex and opposite- 
sex figures can be made. A good example of this graphic sign 
method is the scale of figure drawing items which is presumed to 
measure field-dependency (50). Sets of 40 items for men and 45 for 
women were constructed by Machover on the basis of a preliminary 
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analysis Criterion groups for the initial selection of items consisted 
of college students with high and low field dependence as meas- 
ured by a battery of perceptual tests A total score is obtained by 
summing the number of signs checked during the detailed analysis 
of the two figure drawings Some of the signs are completely ob- 
jectne such as transparency, lack of ears, or hair shaded Others, 
like consistency rating and rigid ity rating, are subjective and re- 
quire a clinical judge For the most part, however, the list of signs 
is sufficiently objective to merit further study 

Graphic signs have been used with similar success by Pascal and 
Suttell m the objective scoring of drawings m the Bender Gestalt 
Test (34) The test consists of nine geometric forms that are copied 
by the subject The number of scorable signs on each design varies 
from 10 to 13, with seven additional signs dealmg with the total 
configuration of all nine drawings Each sign is given a numerical 
weight varying from one to eight The size of the weight was 
empirically determined in earlier studies differentiating normals 
from such groups as psychotics and organics 

A single score is obtained by summing the weights of positive 
signs, the higher the score the more pathological the record Al 
though much valuable information may have been sacrificed at 
the expense of obtaining a single quantitative index, the resulting 
score has sufficiently high reliability and validity m a variety of 
situations to prove highly useful as a screening procedure 

A third variation of semi structured drawing which represents 
an attempt at objective quantification is the Drawing Completion 
Test described by Kinget (23) Eight squares are presented to the 
subject, each containing small, but suggestive, stimuli such as a dot, 
a wavy Ime, or a black square, around which the subject draws 
whatever he wishes Kinget has attempted to develop a graphic 
system with a senes of crudely quantitative variables, some based 
on content analysis and others dealing with style and expressive 
features of the drawings A personality profile is constructed by 
recording signs and then adding them together m more abstract 
categories, somewhat like the first attempts to quantify the Rors 
chach While the rationale behind the scoring system is highly 
speculative and smacks of arm chair analysis without adequate 
empirical support, the method itself is interesting and sufficiently 
novel to deserve careful study 

Working with spontaneous finger paintings, a construction which 
has proved very difficult to quantify, Dorken (10) has develope a 
senes of objectively defined rating scales for energy output, affective 
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range, contact with reality, and clarity Pictorial Rormswereused 
as mints of reference to anchor the scales. The variable, Aff ^ 
Range, illustrates the technique. Spontaneous colors. , r 
yellow, were each assigned scale values of three, blu S r 
were given values of two each, and the “somber” colors, blachand 
brown were each scored one. Combination “lors wcrc scorcd 

relation to this primary scale Tcst-retest rehabdity mnged^om 

13 to .84, depending upon the sample and time inters a 
administrations. By ring a series of Cnger paintmgs, reasonably 
adequate summary scores on the four variables defined y 
should be possible. , , , . „f 

It is significant to note that in each of the above cramp 
attempts to achieve objective scoring of projective technique i , 
degree of quantification is pretty much limited to the comp 
approach in which numerous signs are scored, weighte , 
summed to yield some sort of “global” but quantitative, niC 
which is purported to reflect important dimensions of person^?- 
Ideally, the sign approach should begin -with sufficient tlieor 
rationaie to construct a coherent system. After careful opera i 
definition of each sign, the objectivity of scoring should oe 
tcimined by having at least two trained individuals independently 
score a large number of protocols. In some instances w 
several signs have similar rationales in their definition, their 
sistency should be examined empirically in a study to valuta 6 
construct which they theoretically represent (7). In m° s ' c ~/ 
however, a straight empirical analysis without regard for the 
struct in question will be undertaken with the practical 
mind of establishing a weighting system that has maximum e ci y 
for predicting some criterion. In any case, the burden of proo c 
ceming the reliability and objectivity of any proposed scoring S}S 
rests with the individual who proposes it. 


Interpretive Methods 

Assessing personality from the way in which an individual re ^ ea ^ 
his fantasy life in telling a story or interpreting a scene goes ac 
through centuries of mankind. However, the first notable atternp 
to develop a projective test for uncovering a person’s needs, ' wis e , 
and related fantasies by having him tell stories was made by AO 
gan and Murray in 1935 (31). In the past 20 years, Murrays i e " 
matic Apperception Test (TAT) has become a standard projects 
technique, second only to the Rorschach in its widespread use bo 
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m the chmc and laboratory Numerous other interpretive methods 
-Roscnzweig’s Picture-Frustration Study (36), Beliak’s Childrens 
Apperception Test (22), and Shneidman’s Made-A Picture-Story 
Test (43), to mention but a few— stem more or less directly from 
Murrays pioneering work and attest to the fruitfulness of the basic 
method 

Interpretive methods range all the way from one end of the pro- 
jects e-psychometric continuum to the other Representative of the 
purely projective approach is the standard TAT analyzed entirely 
in a qualitative manner, focusing upon the content of stones and 
stylistic aspects of the story telling as illustrated by Stem (44), such 
analysis draws heavily upon careful deduction and clinical intuition 
Only one step removed from this intuitive approach is the more 
formal land of qualitative analysis in which various characteristics 
of each story are classified according to theme expressed, lands of 
affect, need categories, and the like Such qualitative systems tend 
to vary considerably according to the predilection of the analyst 
Representative of the diverse approaches to analysis of TAT proto- 
cols is Shneidman s (43) compilation of systems used by 15 different 
authorities working with the same TAT record 

Several mvestigators have developed sets of rating scales to be 
used with the TAT One of the most extensive systems is Hartman s 
(21) consisting of five-point scales for 65 categories covering the- 
matic elements, feeling qualities, topics of reference, and more 
formal characteristics, each of which can be scored for a given 
story Total scores are obtained by summing ratings across stones 
While such scales utilize the clinical skill of the interpreter, serious 
difficulties often arise when one is concerned with the objectivity 
of the scoring When categories deal with the manifest aspects of 
a story, independent raters can generally agree at a satisfactory level 
to insure fair objectivity But as soon as attention is focused upon 
covert aspects of the response or upon the personality of the story- 
teller rather than his production, agreement falls off sharply (46) 

The reason for this greater subjectivity when dealing with the 
personality of the subject is apparent when one examines closely 
the nature of the factors influencing response to a TAT picture 
Holt (22) discusses nine different determinants of the manifest re- 
sponse, ranging from situational context to personal style of the 
story teller The interpreter is faced with the very complex task of 
weighing the probable influence of each factor before he can arrive 
at an interpretation of the subject’s personality It is somewhat like 
having an equation with nine variables, several of which can be 
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partially discounted while most remain unknown quanOUes Sev- 
eral judges will weigh the unknowns quite differently, resulting 

'"■This difference between test oriented systems dealing .with formal 
characteristics of the response and personality-oriented systems 
which the interpreter makes direct inferences concerning the pe - 
sonality of the story-teller is fundamental The more superficial 
concrete the system, the more objective the scoring and the less 
relevant the derived variables to the personality of the subject 
Young (51) developed a set of 23 well defined traits, such as Anxiety, 
Dominance, and Need to be Loved, which could be used in _ra in & 
the personality of the interpreter as well as the subject, ri ce 
trained interpreters independently rated 12 TAT stones from seve 
different individuals, a total of 84 responses, on each of die 23 traits 
Ratings on the same 23 traits were obtained for each of the 10 
interpreters by a sociometnc method Even though the average 
agreement among interpreters was fairly high for such persona y 
oriented variables, differences m the interpreters* ratings prove 
significantly related to their own personalities, demonstrating e 
intrinsic subjectivity of such methods of analysis 

Several fairly objective variables dealing with story content seem 
sufficiently relevant to important aspects of the story teller s per- 
sonality to merit special attention McClelland and his colleague 5 
(26) have carefully developed the personality construct. Achieve- 
ment Motive, and have demonstrated how it can be reliably score 
in TAT stones The sconng mvolves simple classifications of re- 
sponse elements by objective entena that are then summed to yie 
an overall index of the individuals Need-Achievement score A 
number of experimental studies are also cited indicating the validity 
of the personality construct 

A similar careful denv ation of two test-onented variables of rele- 
vance to the story teller’s personality was undertaken by Eron (1W 
Using v. ell anchored rating scales, Eron and co-workers developed 
fairly objcctiv c measures of emotional tone and outcome that coma 
be applied to single responses and summed to get an overall score 
Both variables have satisfactory inter scorer reliabilities, 86 lot 
emotional tone and 75 for outcome Eron is chiefly concerned witn 
the development of norms for TAT themes that can be used to de- 
fine the general characteristics of each card in terms of the case 
with which certain themes are evoked Such data for the TAT can 
be roughly thought of as analogous to difficulty level or o f her item- 
parameter s in aptitude tests A recent application of Erons ap- 
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proacli demonstrates how Guttman’s scaling method can be 
emplo) ed using normative TAT data to construct a um dimensional 
scale for need-Sex (1) 

A final example of an objective approach to the scoring of the 
TAT is one devised recently by Dana (9) Three fundamental as- 
pects of test behavior— approach to the situation, normality of re- 
sponse, and rarity of response— were used by Dana to define three 
variables amenable to objective scoring, Perceptual Organization, 
Perceptual Range, and Perceptual Personalization Inter-scorer re- 
liability in terms of percentage agreement between independent 
judges ranged from 76 to 94 for the three scoring categories in a 
study of 150 TAT stories The unique aspect of Dana’s approach 
is the fact that these three variables are sufficiently pertinent to a 
large variety of projective techniques to permit inter test compari- 
sons for sharpening the validity of the personality constructs in- 
volved 

Variations of the sentence completion method provide much more 
suitable data for psychometric development than the TAT The 
technique consists of providing the subject with a list of incomplete 
sentences to which he responds with whatever completions come 
to mind By wise selection of sentence stems, content fairly similar 
to the thematic apperception methods can be obtained Of course 
the response is much more highly structured and discrete from one 
item to the next than is the case with the TAT Herein hes the chief 
virtue of the method with respect to quantification. 

Rotter and Willerman (38) developed one of the first sentence 
completion tests with high objectivity Designed for large scale 
screening purposes in the Army Air Force, their 40 item version 
yielded a single adjustment score having inter scorer reliability of 
89 and split-half reliability of 85 A refined version of this test 
designed for college students, the Rotter Incomplete Sentences 
Blank (39) has an objective scoring manual with reported interscorer 
reliability of 96 and split-half reliability of 84, unusually high for 
a projective technique 

Tntes and his colleagues (47) developed a military version of the 
sentence completion method to a high degree of objectivity while 
at the same time dealing with a number of response categories 
rather than just one A scoring manual was written on the basis 
of 1038 test protocols which yielded mterscorer agreement ranging 
from 80 to 96 for eight major variables. Conformity, Ego Esteem, 
Greganousness, Sexuality Attitudes, Air Force oriented Motivation, 
Hostility, Insecurity, and Unscorable Response. Although t ere 
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is little direct evidence to support the validity of these variables 
with respect to the personality constructs implied, in a later facicr 
analysis of inter item correlations where the items had b^en scored 
dichotomously as indicating cither a positive or negative attitude 
with reference to adjustment to flying, Tntes (46) obtained four 
factors v/hich were meaningfully linled to several of the original 
major v anables 

It is instructive to note the characteristics of the sentence com- 
pletion method which are responsible for achiev ement of satisfactory 
psychometric standards Unlike the TAT, the number of discrete 
items can be verv large, making possible an atomistic treatment 
of test elements without undue distortion of the technique. Where 
^he TAT has at most 20 pictures, each with an infini te variety of 
complex responses possible, the sentence completion method has 
highlv structured items for v hich the variety and extent of responds 
are xelativ ely limited. The more circumscribed nature of the tech- 
nique mal es possible the dev elopment of an ohjectrv e scoring man- 
ual for any variables that mav be pre^nt in the response That 
such psvchometnc treatment does not necessarilv reduce the use- 
fulness of a projective method is demonstrated bv the repeated!' 
high validity obtained for the Rotter Incomplete Sentences Blank 
m assessing level of personal adjustment (39) 


Constitutive Methods 

The Rorschach test stands alone amou^ projective techniques 
m the amount of attention, both clinical and experimental, which it 
has received during the past tv enty years and illustrates problems 
encountered in scoring responses to eonstitutiv e methods. Quanti 
tatrveanalysis of responses to ini blots has ranged all the way from 
one extreme of the projective psychometric continuum to the other 
Some writers (25, 41) have pointed out how the Rorschach can be 
dealt with in a purely qualitative manner, emphasizing the dymarruc 
and svmbohc nature of the content and leaning h cavil . upon psv 
choanalytic theory and the intuitive skills of a ckm aan. Assoma 
£ons to ml blots are s^en as only one step remov ed from coxnpMcb 
tree association m th* psv choanalvtic session. Others (20 33) have 
shown how rughlv structured and completelv objective multip 7 e 
choice nvihods can be apphed to the study o f individual diucrences 
F >rce P tl on of inkblots And cunouslv enough, the same 10 
inkblots are used throughout 1 

To wliat extent are these various degrees of structuring and quan- 
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tifi cation based upon sound principles of measurement theory? 
Does the Rorschach really spin the entire projective psychometric 
continuum with the high degree of power claimed by some of its 
proponents? 

The most rudimentary form of quantification in the Rorschach 
is the assigning of symbols to certain hinds of responses which are 
then looked upon as signs pointing to various personality attributes 
or nosological classes An excellent example of such a classification 
of qualitative signs is the analysis of verbalization described by Rap 
aport (35), who presents a very careful rationale for the sconng of 
such pathognomic verbalizations as confabulations, contaminations, 
confusion, absurd responses, and ideas of reference Such signs are 
not additive except in the very crude sense that a number of positive 
signs in a single record tend to pile up m confirming the diagnosis 

The widely used ‘‘formal” sconng methods for the Rorschach 
represent attempts to measure the perceptual vanables implicit in 
the response The complex nature of the stimulus permits a wide 
latitude of location, of determinants, and conceptual content Once 
decisions have been made as to what constitutes a discrete response, 
the number of such responses to a given inkblot or to all 10 Ror 
schach plates can be determined Although there are some minor 
problems encountered in deciding when a verbalization is truly a 
response for purposes of scoring, one can safely assume that inter- 
scorer agreement as to number of responses (R) is quite high re- 
gardless of the judge’s theoretical position Similarly, the scoring 
of location, at least in its gross elements of whole, usual large detail, 
or small and unusual detail, does not pose serious problems in the 
attaining of reasonable objectivity Aside from specialized uses of 
content such as Elizur’s anxiety score (11), the categorizing of con- 
cepts into human animal, and other generic classes is quite 
straightforward also The greatest difficulties in achieving scoring 
objectivity arise m the realm of response determinants 

Trying to determine those stimulus attributes which are responsi- 
ble for eliciting a given response amounts to a kind of global psy- 
chophysics for which the general laws have yet to be worked out 
Although logical in their conception, most scoring systems for de- 
terminants involve a number of highly arbitrary decisions, the wis 
dom of which is highly debatable The subjectivity of the method, 
the influence of factors extraneous to the blots such as the examiner- 
subject interaction (40) and variation in style of inquiry (17) raise 
troublesome questions concerning the meaning of scores once 
achieved 
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Presumably the inquiry phase of the Rorschach is designed to 
discover the characteristics of the inkblot which prompted th 
lect to give a response The subject is ashed by rather vague a 
indirect questions to introspect, to analyze the perceptual proces 
and report to the examiner what about the blot suggests, lor - 
ample, “a bloody finger,” or “a pretty flower” A helpful subject 
who senses what the examiner is after may reply by saying, „ 
shaped like a man's thumb and is colored red, suggesting b ooci 
More than likely, however, the subject will say, “It just looks like i 
to me,” leaving the examiner about where he started And even 
the subject does mention the color as playing a part m the * 

do we nave any way of knowing whether the subject would nave 
reported blood in the absence of color? How do we know it wasn 
the combination of form and shading that suggested a b oo y 
thumb? The unfortunate fact is that we simply don t know, a - 
though recent studies by Baughman (2) provide a better basis tor 
guessing 

Zubin (52) has recognized this problem and has tried to overcome 
it by introducing a much more exhaustive inquiry than the usua 
brief, indirect questioning In addition to asking many more ques- 
tions per response, he has experimented with inquiry immediate y 
following the response rather than waiting until all 10 inkblots have 
been administered Sixty scales were constructed that could be 
applied in scoring a single response, provided the inquiry was suy- 
ciently exhaustive Five scales deal with location, six with the o 
jective attributes of the stimulus, six with determinants or the 
relative importance of stimulus attributes m the formation or trie 
percept, 14 with interpretation categories such as surface texture 
or strength of movement, tliree with organization activity, 15 wit 
content, and 11 with other aspects of the single response sucm as 
reaction time and popularity In addition, there are six scales dea - 
mg with variables present in the protocol as a whole When one 
stops to think that Rorschach records frequently contain upward 
of 50 responses, the amount of energy invested in scoring 60 scales 
on each response is tremendous 

If a sufficient amount of information were available about the 
objective stimulus attributes and the correlates between these at- 
tributes and characteristics of the response, the amount of work 
required to utilize Zubin’s system might be justified However, the 
%er) nature of the complex stimulus confronting the subject m the 
form of an ml blot defies all but the crudest, global type of descrip- 
tion as far as the specific stimulus attributes are concerned. With 
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respect to the determinants or global psychophysics of the reported 
percept, even a highly trained introspectionist would be hard put 
to verbalize accurately the relative importance of various inkblots 
characteristics in forming the percept Since the greatest value for 
the Rorschach is claimed to be the study of psychopathology where 
the subject’s ability to introspect accurately may be seriously im- 
paired, there appears to be little real hope of obtaining the kind of 
information necessary to use many of the scales Zubin has proposed 
Although Zubin’s system may not really increase the objectivity of 
scoring for the Rorschach, since it is comprised largely of five point 
scales for recording clinical impression, his exhaustive approach 
immediately points out the fundamental weaknesses inherent in the 
standard methods of scoring 

In addition to the fact that objective scoring for most inkblot 
variables cannot be achieved without the use of arbitrary rules, the 
standard Rorschach is inherently poor as a psychometric device m 
some other important respects Providing the subject with only ten 
inkblots and then permitting him to give as many or as few re 
sponses to each card as he wishes characteristically results m a se 
of unreliable scores with sharply skewed distributions, the majon y 
of which fail to possess the properties of even rank order measure 
ments One record with an R of 20 may be comprised o sing e 
responses to the first nine cards and 11 responses to Car , w i 
another may consist of two responses per card Any o e usu 
scores with the possible exception of form level will av ® c l u * , 
different meanings m the two contrasting protoco ^veii “8 
the total number of responses is constant Add to this e i 
arising when R vanes from less than 10 to over 1 > an 1 y 
to see why most quantitative studies involving the s an 
schach yield confusing or negative results TWcrWh 

In a general review of statistical methods applied to Rorschach 
scores, Cronbach (8) has considered several ways in which Urn con 
founding effect of R upon most other vanab es can , . ov- 
(a) Computing percentage ratios of each variable over R,() 
mg the hnear effect of R by partial regression techmques, (c) 
ducing the effect of R by plotting the viable > aganut 
drawing a freehand Ime fitting the medians of e sam _ 

form of curvilinear partial regression), or (d) ivi mg respe ct 

pie into a number of subgroups that are homogen ot j ier va n- 

to R before proceeding with any quantitative analyse = of other van 
, ( ^„mnhnP nercentage ratio 


ables Th^iEual^roc^diire^fconiputiiig'pereentage ratios is highly 

unsatisfactory because of the crude metric qua ie 


5 of most Ror- 
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(14) demonstrated * , * , , Consequent 

mg categories are usually comply ^“^g the confounding 
the usual lmear regression methods lor remo _ jj ^ w respon se 


the usual linear regressicm iv “ a standard free response 

effect of R wall generally fad. Given a s^i e f 

Rorschach, the only procedure t0 ^ and analyze each 

controlling R is to fo E n i s “ b P°“P s Efficient procedure leaves 


[trolling R is to form subgroups accorcmi procedure leaves 

i independently But even this very F ^ 

answered the senous critic, sm ^at ^voreco due w 


=bif *• ” 

•vhen R is a variable, most clinicians make allowance t0 

crude mtuitiv e way Buhler (5) goes will 

structure the test administration so that three t ? m 

be given to each blot Blake and Wilson ( ) However, 

part by considering only the first response to each card * rf 
having only 10 responses from winch to obtain *» ■ WefflS m 

winch occur rather rarely, creates a whole host o P nt 

attempting to achieve satisfactory standards 0 f pro- 

standardization of testing conditions and a , a t l , m c 

cedures for admimstenng the Rorschach to ,ar S f? P Munroe 


cedures for admimstenng the Korscnacn io ia b b i Munro e 
represents another attempt to achieve more ob )£“" > ted the 
(32), narrower (20), Sells (42), and others have jJ^^^cnBce 


W, narrower v— „ V i „ r . n 11 in <r to sacnDCu 

feasibility of group procedures provided one “ '™S mdivadual 
certain aspects of the more unstructured, personalized ^ a 

Rorschach The usual procedure is to project each down his 
hree screen for three minutes while the subjec is tin 

responses in a standard booklet The number of rcs P , n qmr> 
controlled tlic subject is usually given a very simple, dueCH^ 
concerning the role of shape color, movement, an ,’ on a 

location is indicated by drawing the outline of his p l 
miniature replica of the blot , unrsehach 

Most of the scoring difficulties inherent in the standard crt , 

are aggravated still further by use of such group method 
one at least has the opportunity for such things as the scor 
o! verbalizations and individualized inquirv to help clea ij, nrl scs 
ing problems m the standard Rorseliach, the group method 1 d( ,_ 
the examiner of all but the most superficial cues for scoot, ^ 
terminants increasing lurtbcr the arbitrary nature ol the ' i e j 
If one uses standard paper and pencil aptitude tests as 
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to be emulated, tbe most highly structured, psychometrically sound 
form of the Rorschach would appear to be a multiple-choice test 
with sufficientl) standard instructions to permit its use with large 
groups of subjects Under pressure of screening demands during 
wartime, Harrower and others (20) developed a multiple-choice 
version in which the subject chooses from a list of thirty concepts 
those tliree which look best to lum for the particular blot m ques- 
tion Fifteen of the 30 available concepts presumably indicate psy- 
chopathology while the remainder reflect normality. Harrower s 
own system of scoring is unusual and unnecessarily complicated 
Normal answers are arbitrarily weighted “1” for any concept mvolv- 
mg human movement, “2” for any that represent a popular response, 
“3 and “4” for those which involve color-form integration and 5 
for space responses The set of abnormal answers is assigned weights 
varying from "6” to "9” in a similar arbitrary fashion The total score 
obtained by summing the weights for the concepts c losen is con 
fused in its meaning because of the arbitrary weig ting sys em 
More recently, O’Reilly developed a simpler multiple choice form 
with 12 choices per blot, four from psychotic records, four from 
neurotic records, and four from normals Jhe subject is uhed^ 
select the two concepts which best describe die in 
are weighted on a three point system with or no 
for psychotic Almost complete separation of. ““P 1 * J™” P 
chotics was achieved in a cross-validation although the neuro 
had only slightly higher total scores than did the n °™ alS , j 
Another interesting, objective approach utilizing . P , 

choice format is the concept evaluation technique deve bpedjiy 
McReynolds (29) Using Beck’s list of good and p ^ 

according to form level (3), McReynolds seec e J , p] ie 

poor concepts spread throughout the 10 R °”^ted t 0 mdicate 
subject is shown the location of the concep Generally given 

whether or not the inkblot looks h e £Xds phase, 
after a standard Rorschach as part of V reliable and 

McReynolds’ concept test yields an objective, sco , ^ 

well-defined measure of the de ^®® ^ W of ‘toe mam advantages of 
criminate good from poor concepts O discrete stimuli 

McReynolds’ test is die fact that the number o ^discrete 
(intact areas of inkblots) has t ieen increased^i from 
breaking up the standard 10 Rorsch P f r0 m the usual 

ponents This point is a highly significan P same stimulus 

ipsative method of allowing repeated response to the sam 
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and probably accounts for the satisfactory internal consistency 
/snlit half reliability of .82) that McReynolds obtained. 

( ^Sower Shas pointed out, th / highly structured 
choice versions of the Rorschach are no l«M«» ^ dv L 
standard individual Rorschach except for the inkblots ttiemsei 
Sue could go a step further and question whether or not tests that 
have completely fixed response alternatives can even b bjective 
projective” techniques. In all respects they appear ! 10 be objectw 
tests of perception which may have implications for the measvu 
ment oftaportant personality traits. The course of development 
from an unstructured projective technique to a comple y 
tured objective test is complete. 

A New Solution 

The fundamental question of how to develop 
sound scoring procedures for responses to inkb lots * 
serving the rich qualitative projective matenal of the Ho 
has been approached from a new pomt of view at T1 _ reat ) y 
of Texas. 1 The major modifications undertaken consist ol gr X 
increasing the number of inkblots while limiting the num . 
responses per card to one, and extending the vanety te . 

colors, pattern, and shadings used in the original Rorscha 
rials. From an exploratory study it was concluded that a test 
taining 45 inkblots, to each of which only one response is g? > 
would be feasible to construct and would probably tap essen 
the same variables as the classical Rorschach method. Specif 
might have to be made, however, to develop materials wine 
high “pulling power” for responses using small details, space, 
color and shading attributes to compensate for the tendency o g 
form-determined wholes as the first response to an inkblot. 

Such a test would have several advantages over the stan 
Rorschach: (a) The number of responses per individual wou 
relatively constant (b) Each response would be given to an in 
pendent stimulus, avoiding the weaknesses inherent in the 
schach where all responses are lumped together regardless 
whether they arc given to the same or different inkblots, (c) h a ng 
a fresh start in the production of stimulus materials, especia y 

* Initial Impetus for Uds research was given the writer by a 
Fellowship from the Social Saencc Research Council. Inc, of New ror * tto^2 
cratly the research program lias been supported by a grant-in-aid from Ui 
Foundation for Mental Health, The University of T exas . 
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view of recent experimental studies of color, movement, shading, 
and other factors in inkblot perception, would yield a richer variety 
of stimuli capable of eliciting much more information than the 
original 10 Rorschach plates, And finally, (d) A parallel form of 
the test could easily be constructed from item-analysis data in the 
experimental phases of test development, and adequate estimates 
of reliability could be obtained independently for each major vari- 
able. . 

The research to date has borne out all original expectations. Two 
matched alternate forms, A and B, of the Holtzman Inkblot Test 
have been developed, each containing 45 inkblots. Two additiona 
blots are common to both forms of the test and appear as practice 
blots before the others. Instructions to the subject are similar to 
those used in the standard Rorschach with the exception that the 
subject is asked to give just the primary response to each card, and 
a brief, staple inquiry is made after each response where necessary 
to clarify the location or determinants. Administration of the test 
is easier than the Rorschach, and the subject generally n giving 
only one response per card is a fairly simple task. 

Six major variabfes are scored for each response, while a number 
of minor variables or qualitative signs are scored when deemed l ap- 
propriate. The major variables were selected and defined acci rrding 
to the following criteria: (a) The variable had to be one which could 
be scored for any legitimate response. Variables which only Rarely 
occurred were set aside for the moment, (b) The variable had to 
be sufficiently objective to permit lu'gh scoring agreeme .jj 

trained individuals, (c) The variable had to show some a . pnori 
promise of being pertinent to the study of persona ‘I' l f 

ception. And (d) each variable must be logicaUy indepentant^of 
the others. Location, Form Appropriateness, , , , . 

Color, Shading, and Movement Energy Level were s eIe otcd 
tensive study and provided the basis for item-analyses in the iinal 
selection and matching of inkblots for Forms an • , moun t 

Location as a variable was defined strictly in of ‘ta amoun 

of blot used and the extent to which the nature ges a of The blox 

was broken up by the response A tlnee-point we ghtagjtam 

was adopted with “0” for wholes, 1 f° r 3 8 , ^ f rorn 

for small areas, making possible a theoretical range of seorcs 

0 l The" scoring of color was based entirely upon th parent pri- 
macy or importance of color, including ^ ac 8™/- ’ | ie co | or j n his 
response-determinant. When the subjee 
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response, scoring was relatively simple On ™re °cc^ions^to 
it was apparent that the ^nse would have WU ^ 
nhle without the presence of color, credit tor coiv b cimilar 
though never mentioned by the subject A four-point syste 
“ Ore Borsehach was adopted with “0” for completely .gnon«j 
color and “3" for use of color as the sole determ, nant. Total scores 
for Color have a theoretical range froir i 0 to 133. , a 

While subtle distinctions in the different uses of shading « 
determinant are usually made in the Rorschach, no su 
ations are made in the' Holtzman Inkblot Test. As with Color the 
scoring of Shading was based solely upon the i apparent :pnn ■ J 
shading as a determinant. Because pure shading responses * &ea . 
rare, only a three-point scoring system was used, yielding 

^The scoring of movement is linked closely to content in most con^ 
temporary scoring systems for the Borsehach. Too fret l y not 
practices lead to highly arbitrary convention as to whetfi 

movement is scored or how it is scored. In the Klopfer systcr ( 
for example, “airplane” and “bat” present difficult problems. & 

you be sure the airplane is flying? Even when an airplane doe y, 
there is no movement of its parts and no movement rela n . 

frame of reference unless landscape is added. Is bat to )e 
FM for animal movement while “airplane” is scored Fm tor J 
mate movement when both concepts are really precision a 
tives rather than uniquely different responses? The : resulting pi 
is often highly confusing from a psychometric point of view. ^ 
essential character of the movement response is the energy ev 
dynamic quality of it, rather than the particular content. Lea & 
heavily upon Zubin (52), Sells (42), and Wilson (49), a five p 
scale was adopted varying from “0” for no movement or po en 
for movement, through static, casual, and dynamic movemen 
a weight of “4” for violent movement such as whirling^ or explo i &• 
Movement Energy Level ranges theoretically from 0 to 180. 

Different authorities vary in the extent to which concept ela o 
tions and specifications are confounded with the goodness of « 
the concept to tlie form of the inkblot. In the Holtzman In 
Test, Form Definiteness was defined independently of form lev 
in the usual sense and refers solely to the definiteness or specificity 
of the form of the concept represented in the response, disregarding 
completely the characteristics of the inkblot. Working indepen 
cntly with a large number of concepts culled from inkblot respons > 
five psychologists placed them in rank order with the most form 
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definite concept at the top The independent sets of ranked con- 
cepts were then merged to ) leld an o\ erall rank order for the entire 
list Cutting points were chosen so that five levels of form definite- 
ness could be distinguished The resulting set of examples served 
as a scoring manual, with a weight of ‘*0 for the most indefinite 
concepts, such as anatomy drawmg, squashed bug, or fire, and a 
weight of "4” for the most definite concepts, such as Indian chief, 
violin, or knight with a shield Form Definiteness has a theoretical 
range from 0 to 180 

Form Appropriateness, the last of the six major variables, is by 
its very nature a subjective variable, requiring extensive preliminary 
work to make scoring reasonably objective And yet, it is this very 
subjectnity which gi\es the variable great theoretical importance 
Beck (3) recognized the likelihood that goodness of fit of the concept 
to the form of the mkblot would be closely related to degree of 
contact with reality and undertook a major study of form level that 
has proved to be one of the most valuable contributions to the 
Rorschach Considerable effort was spent in amvmg at acceptable 
standards for scoring Form Appropriateness Different responses 
to each inkblot were listed separately for each location and rated 
independently by at least three judges A seven pom sea e 
used to tli “0” representing extremely poor fit Although the 
was good agreement of judges m most cases, a fina ju gine 
each response was reached only after full discussion in co er 
The resulting manual provides a guide to the scoring o o p- 
propnateness on a three point system with zero or ^ F 

form and “2” for unusually good form Form Appropriateness can 

y r u lrai * * 

a sample of 46 records proved m general to be very p J. ^ anc j 
moment correlations of 99 for Location, Form \ g^ 

Movement Energy Level, 97 for Shading, or > _ upon 

for Form Appropriateness Good estimates o a „, ^ > matched 

internal consistency were obtained by usmg u gg r or p onn 
random snbtest meLd (18) Correlations ranged from SO for Fonn 
Appropriateness to 91 for Shading AH six vanablesprovedjo^be 
reasonably normal and continuous in distn u ion A an( j p 

underway to determine the correlations between Fomis A 
with several time intervals and populations o j T t com _ 

Once the standardization of the Holtzman Inkblot Test 
plete, it should be possible to develop Ymrticuhr interest Sey- 
versions of test for measuring variables of p 
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mour Fisher and Sidney Cleveland have already had ^success 

5£"*2ttssr stzvspi 

Ron response (such as “x-ray") and one whieh was "® u ^j ( most 
as “flower”). The subject was ashed to check die one h u 

and place a different mark on the one he liked ' S 

third choice blank. Both the Group Rorschach and the ner^ ^ 
tiple-choice test were given to 60 college stud “^ f ^ ie , Scores 
Cleveland. The correlation between the two sets of .Bam » ^ 
was .64=. This fairly high correlation coupled with th muc h 
the distribution of scores on the multiple-choice te , 

greater than on the Rorschach and was more normally s p 

suggests that the multiple-choice Barrier S<mre would be supen 

to the measure reported earlier by Fisher and Cleve a ( )• ^ 

Considerable ground has been covered in this analys q{ 

more common problems encountered in the objeeti 
projective techniques. The very nature of the projective hypoth^. 
that an individual will reveal something of his private s 
way in which he responds to ambiguous stimuli, has encourag 
almost unbelievably wide range of assessment technique 
the rubric of projective methods. In focussing upon qu 
methods of analysis and their objectivity as measured y re E* 
ducibility, a whole host of important problems concerning the m 
ing of projective responses has been deliberately side-steppe . 
cepts of validity and their empirical determination, examiner- 
ject interactions, variability of response across different p°pu a 1 
of subjects have been dealt with only tangentially if at all. 

One cannot help but observe that few, if any, of these many P 
jective devices can serve well two masters at the same time, p 
ticularly when their original purpose is exploitation of the projec iv 
hypothesis in the clinical diagnosis of personality. While not neces- 
sarily incompatible, the assumptions and historical biases inheren^ 
in the projective approach on the one hand and those in the p s >^ 
chometric approach on the other are at opposite extremes of a con^ 
tinuum defined roughly in terms of the degree of structure ana con 
txol of the subject’s response that is imposed by the method. 

5 * Pmonal coiTL.TuinicaUoa from Dr. Sidney E. Cleveland. 
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unfortunate and bewildering army of inadequate quantification 
characterizes most projective techniques when there is pressure upon 
the projectivist to conform to the rigorous statistical standards of 
psychometric theory without concomitant pressure to revise the tech- 
nique itself. A major challenge to psychologists interested in the 
objective assessment of personality is the development of psycho- 
metrically sound personality tests from available projective devices, 
a point made by Thurstonc (45) 10 years ago which still stands to- 
day. 
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An Approach to the Objective 
Assessment of Successful 
leadership ° 

Bernard M. Bass 

Louisiana State University 


Had i been present at the cre- 
ation ” Alphonso the Learned (1221-1284 a.d.) quipped, "I would 
have given some useful hints for the better ordering of the universe.” 
Alphonso could have made the same comment today about the 
chaos in leadership theory and research. 

The construction of typologies often represents first attempts to 
bring some order, for understanding of a phenomenon usually can 
begin only after some of its important elements are guessed. In the 
field of leadership, we were abundantly blessed with a wealth of 
guesses. Fisher (27) listed some 19 distinct ways of typing leaders 
revealed in the literature from 1915 to 1948, for leadership has been 
a topic covering a wide variety of phenomena, many only remotely 
related to each other. But these typologies are a mere beginning 
compared to the one hundred or more available definitions. 

One cluster of these defines the leader as an individual in a given 
office. S talus is a more useful way to describe variations among 
individuals in position and in behaviors due to their different posi- 
tions since it is commonly employed for the purpose by those work- 

• This work was aided by funds from the Louisiana State Umv ersity Council on 
Research and Contract N*7 OXR 33609. 
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mg in the fields of organization, group behavior, and industrial 
psychology A second cluster of definitions emphasize the leader as 
focus of attention, as representative of the group Again, there al- 
ready is available a widely used concept, esteem , the value of 
members to the group regardless of their position 
The leader often is defined simply as anyone who engages m 
leadership acts But, what is a leadership act? ° ° 


Leadership Defined 

Agreeing with Bowman (21) and Gibb (33), I consider leadership 
an interaction between members of a group Although the groups 
are usually face to face, tins is not considered a necessary condition 
for the occurrence of leadership It is rather a usual condition 
Leadership occurs when one members behavior is concerned with 
changing another members behavior 

This definition is close to those of Gurnee (35) and LaPiere and 
Farnsworth (46) who defined leaders as agents of change, as persons 
whose acts affect other people more than other people affect them 
It also conforms to Smiths (65) conceptualization of controlled 
interaction, and with those defining leadership as influence and as 
behavior making a difference among groups 

A may try to change B s behavior, tlus is attempted leadership B 
may actually change Ins behavior as a consequence of As attempt, 
this is successful leadership B s change may result in B s own goal 
attainment, this is effective leadership (38) 

This conceptualization differs from Hemphill s (39) mainly in the 
kinds of changes included in the meamng of leadership For Hemp- 
hill, leadership acts are limited to those concerning alteration of 
consistent patterns of interaction within the group Excluded are 
signals, task analyses, expressions of attitudes, information giving 
or asking, requests of suggestions, proposals, and acceptance or 
rejection of earlier suggestions I have chosen a much broader defi- 
nition Each of these acts excluded by Hemphill generally will be 
regarded as leadership although it will depend on the function of 
the specific act 

What are the ways in which A can change B’s behavior? A can 
alter B’s drives both m strength and direction Stated in different 
terms, A can change what B regards as his goals and the importance 
of these goals By pointing out the challenge and rewards of study- 
ing law, a professor may arouse or strengthen in a student strong 
interest in a law career A Caesar or a football coach may arouse his 
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The construction of typologies often represents first attempts to 
bring some order, for understanding of a phenomenon usually can 
begin only after some of its important elements are guessed. In the 
field of leadership, we were abundantly blessed with a wealth of 
guesses. Fisher (27) listed some 19 distinct ways of typing leaders 
revealed in the literature from 1915 to 1948, for leadership has been 
a topic covering a wide variety of phenomena, many only remotely 
related to each other. But these typologies are a mere beginning 
compared to the one hundred or more available definitions. 

One cluster of these defines the leader as an individual in a given 
office. Status is a more useful way to describe variations among 
individuals in position and in behaviors due to their different posi- 
tions since it is commonly employed for the purpose by those work- 
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mg in the fields of organization, group behavior, and industrial 
psychology A second cluster of definitions emphasize the leader as 
focus of attention, as representative of the group Again, there al- 
ready is available a widely used concept, esteem, the value of 
members to the group regardless of their position 
The leader often is defined simply as anyone who engages m 
leadership acts But, what is a leadership act? 


Leadership Defined 

Agreeing with Bowman (21) and Gibb (33), I consider leadership 
an interaction between members of a group Although the groups 
are usually face to face, tins is not considered a necessary condition 
for the occurrence of leadership It is rather a usual condition 
Leadership occurs when one members behavior is concerned with 
changing another members behavior 

This definition is close to those of Gumee (35) and LaPiere and 
Farnsworth (46) who defined leaders as agents of change, as persons 
whose acts affect other people more than other people affect them 
It also conforms to Smiths (65) conceptualization of controlled 
interaction, and with those defining leadership as influence and as 
behavior making a difference among groups 

A may try to change B s behavior, this is attempted leadership B 
may actually change his behavior as a consequence of A s attempt, 
this is successful leadership B’s change may result m B s own goal 
attainment, this is effective leadership (38) 

Tins conceptualization differs from Hemplnll s (39) mainly in the 
kinds of changes included in the meaning of leaderslup For Hemp- 
hill, leaderslup acts are limited to those concerning alteration of 
consistent patterns of interaction within the group Excluded are 
signals, task analyses, expressions of attitudes, information giving 
or asking, requests of suggestions, proposals, and acceptance or 
rejection of earlier suggestions I have chosen a much broader defi- 
nition Each of these acts excluded by Hemphill generally will be 
regarded as leadership although it will depend on the function of 
the specific act 

What are the ways in winch A can change B’s behavior? A can 
alter B’s drives both in strength and direction Stated m different 
terms, A can change what B regards as his goals and the importance 
of these goals By pointing out the challenge and rewards of study- 
ing law, a professor may arouse or strengthen in a student strong 
interest in a law career A Caesar or a football coach may arouse his 
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men to fighting pitch with a speech before battle For us, these 
are acts of leadership Lowering of motivation may also be in- 
cluded A baseball manager providing strong fatherly reassurance 
to his "too-tense” team near the end of the race for the league 
championship is engaged m a leadership act 

A can strengthen or weaken relatively responses of B to various 
stimuli A can change B's behavior by reinforcing certam habits 
or reducing the strength of these tendencies Included among these 
habits are abilities (habits where we evaluate the response in terms 
of success in goal attainment) and attitudes, faiths, and beliefs 
(habits where the response is towards what stimulated it) Another 
way of describing the same phenomenon is to state that A can alter 
B’s abilities to cope with his immediate problem Concretely, a 
sales manager who informs his subordinate of the necessity of sub- 
mitting accurate, clear, daily reports of his activities and then 
occasionally compliments the salesman on his reports when done 
well and criticizes items when not presented correctly, is strength- 
ening a habit pattern to submit desired reports This is leader- 
ship A counselor of a group m therapy who fails to "reward” a 
neurotic member for his emotional outbursts nor show alarm, may 
be serving to reduce the strength of a behavioral tendency by the 
neurotic to exhibit such behavior Therefore, the counselor is dis- 
playing leadership 

AnurniAnY Restrictions ov the Meaniac of Leadership 

Changing the immediate needs of B and B’s ability to satisfy his 
motivation are not the only ways of modifying B’s behavior For ex- 
example, altering the integrity of the central nervous system of the 
organism, B, via surgery, injury, drugs, etc, will modify B’s be- 
Iiavior Also, B’s behavior may be changed by changing the cir- 
cumstances stimulating B Alterations of the integrity of an organ- 
ism’s central nervous system, and manipulations of the stimulating 
situation arc arbitrarily excluded, by definition, as leadership acts 
X^ychosurgcry or drug therapy are not leadership acts— by defini- 
tion Cliangmg slides projected on a screen as such, is not a leader- 
ship act— by definition On the other hand, teaching and psycho 
therapy winch are not usually considered leadership processes, must 
be included in what I have defined as leadership, because they 
cannot be differentiated b> definition from the more general con- 
cept, leadership as defined here But, leadership, as defined, can 
be distinguished from a number of related concepts such as be- 
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havioral contagion, influence, followership and vicarious experience. 
Space limitations prohibit discussion here. 

Titles and office-holding involve much more than leadership. It is 
necessary to distinguish between the leadership displayed by the 
foreman from all the behavior of a foreman, for foremen shuffle 
papers, compute output figures, check inventories, and operate 
equipment and so on. If leadership is defined as anything done by 
one who holds an office or by one who is designated a leader, we 
would find ourselves trying to develop principles encompassing 
almost all human behavior. 


Measurement of Successful Leadership 
Table 1 illustrates many of the possible ways of measuring suc- 
cessful leadership. 

TABLE 1 

Some Wavs of Assessing Leader Behavior 


Mode of 
Assessment 

Self 

Historical 

Autobio- 

graphical 

Selective 

Announcement 
of Candidacy 

Job Placement 

UsurpaUon 

Ratings and 
Check Lists 

Self 

Projective 

Projective 


Sketches 


Effect on Others 


Assessor 

Other Members 
of Groups 
Recollection 


Sociometry 

Voting 

Cooptation 

Appointment 

to Office 

Supenor- 

Buddy-Peer- 

Subordinates 

Projective 

Sketches 

Example. Satis- 
faction with 
Group Effort 


Observations, 
Records and 
Instruments 
Historical Documents, 
Biographies, Case 
Histones, Interviews 

Nominations and Elec- 
tion Results 

Appointments and other 
Administrative Acts 


Observed Roles Play ed, 
Frequency of Acts; 
Test Results 

Thematic Analysis of 
Content of Essays 

Observed Changes in 
Groups 

Overt Cliangcs m Mem- 
bers and Croups 
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Historical 

Naturalistic case-histories are a common approach employed by 
many social scientists, such as applied anthropologists, to assess 
leadership in primitive societies, industrial groups, and volunteer 
agencies. Often associated with this process is intensive interview- 
ing of present or former members of the group to obtain their recol- 
lections and present opinions. Heavy reliance on such methods can 
be found in the works of Elton Mayo (48) and F. J. Roethlisberger 
(60) or Burleigh Gardner (32). 

Analyses of case histories, as such, were employed by Ackerson 
(1) and Brown (22). Examination of biographies to assess leadership 
was exemplified by Cox’s study (25) of the biographies of 300 
geniuses. Less formal attempts of this sort began with the first 
historical writing. For example, Plutarch's Lives paired Roman and 
Greek leaders to assess each member of a pair in comparison with 
the other. 

Selective 

Stogdill (66) listed 28 major studies assessing leadership by choice 
of associates suggesting as most exemplary the work of Jennings 
(44). Many of these actually focused on esteem, not leadership, per 
se. Although given much prominence only in the past two decades 
by the sociometrists, the procedure is found in Terman’s 1904 
leadership experiment (70). Nominations by observers outside the 
group have also been commonly employed (e.g., Burks, (23)). 

Studies of election results have been of concern to political scien- 
tists, public opinion analysts, and related social scientists. Attention 
here is often centered on voting behavior rather than on leader 
behavior stimulating the voting. However, Sanford’s (61) work 
illustrates how political elections are related to the personality needs 
of the voter and the stimulus properties of the office-seekers. 

Job Placement 

Coup d’etat, other forms of usurpation, as well as observed and 
recorded legitimate administrative acts provide further means of 
assessing successful leadership. Appointment to office and tenure 
also are guides to identifying and studying leader behavior. Stog- 
dill (66) mentioned 33 psychological studies of persons occupying 
positions of leadership varying from fraternity presidents or govern- 
ment administrators. But most of these are studies of status, rather 
than of leadership as I define it 
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Ratings and Check Lists 

Most common at present is the use of specially developed rating 
procedures and check lists for studying leader behavior For ex- 
ample, the work of Stogdill and Coons (67) and associates illustrates 
the factorial approach to construction of behavior check lists be- 
ginning with empirical surveys and concludmg with theoretically- 
onented, factonally independent scales of leader behavior These 
scales have been used to describe leaders by superiors, associates 
and subordmates as well as by leaders themselves The widespread 
use of peer ratings or buddy ratings m the military services are an- 
other example of ratmgs to assess leader behavior (42) 

More objective and theoretically based in their mutual construc- 
tion are the categorizations of roles by Benne and Sheats (17) and 
the interaction process analysis of Bales (4) In the same class is 
Thelens (71) description of a method of categorizing behavior in 
groups based on Bion s work emotionality concepts In Benne and 
Sheats methods, observers note which of many defined roles are 
played by the various members of the group For example suc- 
cessful discussion leaders of initially leaderless discussions reliably 
have been observed playing the roles of initiator contributor, opin- 
ion giver, elaborator, compromiser, onenter evaluator, energizer and 
encourager (7) Bales' procedures reduce subjectivity further Each 
action by a member is categorized m one of twelve types falling into 
four areas The frequency each member exhibits each type of be- 
havior can be measured with high observer reliability Again, lead 
ers in initially leaderless discussions are found to exhibit certain of 
these behaviors with high frequency particularly those m the areas 
of attempting answers and positive socio-emotional responses 

Tests 

Early attempts to describe leader behavior were characterized 
by the “armchair” listing of traits found among successful leaders 
by the leaders themselves, by observers, or by surveyors of the 
leadership literature Another similar indirect approach was based 
on administering personality inventories and other psychological 
tests to designated leaders inferring leader behavior from the traits 
found to predominate among the leaders (66). 

Protective Techniques 

Torrance (73) has presented ambiguous sketches of leader fol- 
lower situations to groups The stones told by the groups appear to 
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provide reliable indices of the overt effectiveness of leadership 
within the group. Indirect information about leader behavior has 
also been gathered by administering the Thematic Apperception 
Test to business executives (40). My Job Contest (20) involving the 
analyses of themes of essays by workers at General Motors sub- 
mitted as contest entries represents still another projective approach 
to studying leadership and behavior in groups. 


Need for Objective Assessment 

If we are primarily concerned with understanding, predicting, 
and controlling leader behavior, as such, it becomes desirable to 
develop ways of “sensing" the behavior itself. On the plane of 
observables (51) are needed the similar sense impressions corre- 
sponding statistically to the constructs. The bridge from the theoret- 
ical model to the protocols of the laboratory will be firmest if 
subjective impressions do not intervene between the theoretical 
constructs and the “world of facts.” The facts about behavior are 
vague. They are doubly difficult to deal with when gained “second- 
hand” from observers or group members’ reports. We desire opera- 
tions definite and quantitatively precise if possible; operations 
repeatable and objective. In order to study leadership experi- 
mentally, we should like to anchor our definition in leader behavior 
measured sufficiently objectively to avoid being tampered with by 
observers’ or participants’ biases— unless we want to study biases. 

Dangers of Subjectivity 

Reliance on observers, participants, or subjects’ mediation of the 
raw data to be examined give rise to dangers. Viteles (76) illus- 
trates the error possible in depending on case history and interview 
studies. The “Hawthorne” investigators attributed much of the 
control of behavior in a bankwiring room to an informal organiza- 
tion which demanded conformance to norms or common standards. 
The workers did not reveal to interviewers their animosity toward 
management because parts were re-engineered when time studies 
were in error in favor of the workers, nor did they report their 
“stretchout” of work which was due to fear of being laid off in the 
depressed economy. 

Even categorizing observed behaviors falls short of desire. Using 
the same 13 categories of behavior (a modification of Bales’ classifi- 
cation), Bell (16) studied leader behavior in 10 groups in a laboratory 
in a northern university and 10 groups in a southern school of the 
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same size Significant differences appeared in the categorized be- 
havior of emergent leaders in each location Some of this may have 
been due to overt behavior differences, but some of it was probably 
a function of observers Again, Borgatta (20) asked subjects to 
write how they would react to the Rosenzweig Picture Frustration 
Test, then the subjects showed how they would act, finally, they 
were placed in the situation There were no significant correlations 
among the three methods of response Similarly, Halpm (36) re- 
ported little relationship between a leader’s beliefs about what he 
should do and what his subordinates said he actually did The 
picture is comphcated further by the observation that members of 
a group when stating their own opinions tend to compromise what 
they privately "sense” and what they perceive to be the group 
opinion on the matter (34) Mencius (372 b c 289 b c ) recognized 
the difficulties of depending on judgments of leader behavior by 
their immediate superiors alone In paraphrase, his advice to heads 
of state was 

“When all those about you, the ruler, say that a man is talented, do not imme- 
diately rush to promote him Only after his subordinates say so also should you 
examine him more fully as a candidate for promotion In the same way, do 
not rush to demote a man on the evaluation of his superiors alone " 

Once we have a rationale for understanding behavior, we must 
have measurements to promote and comm um cate our understand- 
ings 

the language of number sometimes provides a certain minimum standard 
of integrity m communication without which cooperation of human beings on 
some lands of subjects is almost fruitless I^ird Kelvin declared, if jou 
can t measure it, you don t know what you are talking about (78, p 360 368} 


Objective Measurement of Influence 
Many of the early social psychology experiments on suggestibility 
provided objective assessments of leadership The leader usually 
was the investigator, the followers, his child subjects For example, 
Triplett (74) pretended to throw a ball mto the air Then, he 
determined the percentage (about 50 per cent) of fourth to eighth 
grade children who actually saw the ball go up and disappear. 
Bmet (18) assessed the susceptibility to influence by having subjects 
draw lines indicating their judgment of the length ofstimu us cs 
The stimuli increased in size up to a certain point T e ju 8 me ” 
of the suggestible subjects continued to increase even er 
stimuli presented by the leader, Binct, did not increase. 
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In more recent years, the classic studies by Shcnf (63) and Asch 
(3) are similar illustrations of objective approaches to the study of 
influence Shenf showed how subjects reports of the movement of 
a pinpoint of light (actually stationary) could be altered by his 
hearing other subjects’ judgments of the same autohinctic effect 
Asch found that some subjects could be made to declare the shorter 
of two hnes was actually longer, if all other members of their group 
(all experimental “plants”) declared such was the case. 

Objectivity in assessment of group products also has been com- 
mon m social psychology For example, Mayer (47) and many 
later investigators such as Weston and English (77), compared the 
speed and accuracy of performance of children on selected tasks 
when working alone and m the presence of co-workers, finding that 
presence of others facihtated performance. In recent years have 
come comparisons of supervisors of more productive departments 
with supervisors whose departments are lower in productivity or 
other objective indices of group performance (45) 

Studies of communication nets initiated by Bavelas (15) are an 
other example of the development of objectivity m studying in- 
fluence and leadership All communications between members of 
groups are restricted to the passing of symbols or notes Objective 
analyses of who passes what to whom are the basis for testing 
hypotheses. 

Objective Approaches to Measuring Leadership 

The possibilities of developing relevant, objective, controlled 
laboratory operations to study leadership is supported by research 
evidence in a number of ways Aikman, Lorge, et al, (2) presented 
the same problem at four levels of “remoteness from reality ” The 
problem was to plan the movement of five men across a “mined” 
road The quality of solutions were equally good at all four levels 
verbal description, photographic presentation, miniature scale 
model not allowing manipulation, and a scale model allowing ma- 
nipulation. 

A review of validity studies of the leadcrless group discussion 
concluded that observed success as a leader m this restricted, arti- 
ficial, brief situation correlated with the leadership performance 
of the same persons in real life (6) Flanagan, Levy, et al, (28) 
among several investigators, showed the possibility of constructing 
highly reliable situational tests for evaluating success as a non- 
commissioned officer The tests were based on critical incidents 
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analyses of the leadership positions and included initially leader- 
less emergencies, leaderless small job management, as well as com- 
bat and reconnaissance leadership where the examinee was 
designated as leader m the situation Test performance predicted 
merit as a noncom (r = 46) Specific acts of effective and success 
ful leadership were observed and recorded The average agreement 
among observers’ reports was 83 

Maximum control of experimental conditions has been achieved 
where the group, itself, is simulated Each member is stimulated 
in the same way while he believes he is behaving as a member of 
a group Typical of such studies is one by haven and flietsema 
(58) Members of the simulated groups are separated Each is told 
he is performing a different aspect of the task, but all actually do 
the same job Standard notes are sent to each subject although he 
thinks, they come from the other members Every subject thus 
receives the same “group” experience 

An Objective Approach to Assessing Successful Leadership 

Change in member judgment as a result of interaction with others 
has been studied on a number of tasks For example, Jenness (43) 
examined the changes m the judgment of the number of beans m 
a bottle Ascii’s (3) and Shenf s (63) techniques, mentioned earlier, 
are similar examples Timmons (72) appears to be the first to have 
used the differences in correlations among ranked judgments to 
quantify the effects of group influence He found the accuracy 
in ranking solutions to a problem (as measured by the correlation 
of subjects’ judgments with the correct judgments) was greater 
among subjects given the opportunity to discuss the problem with 
others Preston and Heinz (56) and Hare (37) used the correlations 
among judgments by members of a group to measure stability of 
judgment, initial or final agreement among members, and degree of 
acceptance of the group decision Talland (64) went one step 
further, finding that the correlation between a members initial 
judgments and the final group decision was higher among those 
rated as leaders , 

These within subject correlational procedures offer a way of ob- 
jectively determining the successful leadership of each group mem- 
ber as well as related indices of group behavior They enable us 
to find objectively how much each member of a group changed 
every other’s opinion. At Louisiana State University, methods have 
been standardized as follows On each of a senes of test prob ems. 
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a group of subjects privately rank order their initial decisions about 
the true order of familiarity of five words. Or they may be asked to 
rank five cities according to size of population. Or they may have 
to decide on the order of merit of solutions to problems in human 
relations case histories. Then, they carry on a discussion to reach 
a group decision. Finally, they privately register their own rankings 
again. 

Three measures of successful leadership— public, private, and rela- 
tive— are derived from the correlations between members in opinion 
before and after discussion: 

Successful public leadership of a member is how much more the 
group decision correlates with his, rather than other members' initial 
decisions added to how much the final group decision is like the 
designated members final ranking compared to how much he had 
disagreed with others initially. 

Successful private leadership is how much less a member changes 
his rankings than do other members added to how much more the 
other members' rankings correlate with his rankings after discussion 
than before. 

Relative success as a leader is how much more the final decisions 
of other members correlate with the initial decision of a designated 
member compared with how much the final decision of the desig- 
nated member correlates with the initial decisions of the other mem- 
bers. 

The total amount of absolute public or private leadership turns 
out to be algebraically equivalent to the coalescence of a group 
(how much members increase in agreement with each other). Total 
relative successful leadership of all members combined is always 
zero. 

Methods of Data Processing 

Earliest data collection was by paper and pencil (8). Subsequent 
to these initial analyses, data have been collected either by asking 
subjects to register their opinions on specially prepared IBM mark 
sense cards or directly into a specially constructed analog computer 
(12). The cards are processed on an IBM 650 and auxiliary equip- 
ment to yield the required correlations. If the analog computer is 
used, the experimenter reads the correlations directly from an 
ammeter immediately following each problem. 
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Measurement Reliability 

Table 2 shows the results for four analyses of the split-half re- 
liability of the three measures of successful leadership based on per- 
formance of 10 or 12 problems, or about an hour of testmg 

Motivation of subjects was varied by selecting for the highly moti- 
vated sample those ratmg themselves as strongly interested m 
entermg advanced ROTC, then collecting the leadership measures 
as part of an entrance screening examination The samples of 60 
subjects of medium and low motivation were selected by the same 
means The motivation questionnaire was vahdated m several ways, 
for example, by finding that it accurately predicted formal applica- 
tion to advanced ROTC, as well as by a tendency to appear for the 
examination The fourth sample of 95 subjects was composed of 
arbitrarily selected night school students under no particular ex- 
trinsic motivation to perform well 
Except for one reversal, a consistent trend emerged The lower 
the extrinsic motivation of subjects, the more consistent the leader- 
ship measurements To maximize consistent individual differences 
in successful leadership, it appears necessary to examine subjects 
under no extrinsic compulsion to perform well (11) Two or three 
hours testing would raise reliabilities to where the measures could 
be used diagnostically 

TABLE 2 


Corrected Split Half Reuamlities 
of Measures of Leadership as a Function of Motivation 


Measure 


Public successful leadership 
Private successful leadership 
Relative successful leadership 


Motivation 


High 

Medium 

N=135 

N=60 

K— 10 

K=10 

32 

50 

30 

44 

4S 

J2.9 


Low 

Low 

N— GO 

N=95 

K=10 

K=12 

59 

.55 

75 

52 

61 

6-1 


N = number of subjects 
K = number of problems administered 
For 133 df , p Z 01 when r = 22 
For 93 df p Z 01 when r = -23 
For 58 df, p Z 01 when r = -33 
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Measurement Validity 

This examination will be similar in some respects to an earlier 
publication reviewing validity studies of the leaderless group dis- 
cussion technique (6) Again, construct validity will be considered 
The construct validity of the three measures purporting to gauge 
successful leadership will be compared Logically, theory (7) sug- 
gested that 

1 Ability to solve the group’s problems should be higher among 
those with higher assessments of what was purported to be success 
ful leadership 

2 Esteem among associates should be higher among those with 
higher assessments 

3 Those with higher successful leadership scores should be ob- 
served by others as exhibiting more successful leadership 

4 Those who attempt more leadership should exhibit higher 
assessed successful leadership 

Ability and Successful Leadership 

Table 3 shows the correlations between 255 subjects’ purported 
success as leaders and their abilities as assessed by (1) the Ameri- 
can Council on Education Psychological Examination (ACE), a 
college entrance intelligence test, (2) by their average initial ac- 
curacy m judging the familiarity of the words of 10 problems, 1 
and (3) the subjects’ academic standing through the sophomore 
year ( flic effects of subsampie differences m motivation and status 
were removed by obtaining correlations within subsamples and then 
averaging the results ) 

TABLE 3 

Correlations Betwtln Ability and Three Objective 
Measures op Successful Leadership 


Successful Leadership 
Public Private Bclativa 

14* 14* 29** 

•21** 21** .36** 

13* 12* 19 ** 

** p L 01 
03 


Measures of Ability 
ACC 

Initial Accuracy 
Academic \\trage 


* Initial accuracy of a sul jeet was the rank order correlation between Id* initial 
ranlii of tl c familiar tty of Utc set of 5 v.otds and tl o correct rank order of fa 

jo. wtriiy cl Ok* vttaua- 
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For all 255 subjects, a significant correlation (at the 1 per cent 
level) was found between the ACE and relative successful leader- 
ship. Relations with the other leadership measures also tended to 
be positive but somewhat less significant. Consistent with the 
higher reliability of the leadership measures, the correlations (not 
shown) were higher when subjects were lower in motivation. Intel- 
ligence seemed especially important to relative success as a leader 
among groups where members were more equal in status (r = .41). 

Average correlations of .21, .21, and .36 between initial accuracy 
and the three measures of successful leadership— public, private, 
and relative— were found for the 255 subjects. Again, the correla- 
tions were higher (.34, .40, and .46) when motivation was low. 

Academic performance showed the same pattern of relationships 
with successful leadership but the average correlations, although 
significant because of the large number of cases, were not much 
above zero. 

Generally, the results were consistent with the proposition that 
ability to solve the group’s problems should be higher, (but not too 
much higher) among those with higher assessments of leadership 
supporting the contention that the assessments truly were measuring 
leadership. The relations are strongest between ability and relative 
successful leadership in contrast to the other two measures of suc- 
cessful leadership. 

Esteem and Successful Leadership 

Table 4 shows the correlations between the subjects successful 
leadership during testing; their esteem as rated by their ROTC 
Tactical Officers based on observations over a two-year period; and 
their esteem as measured by their peers during the situational test- 
ing. 

The members of each group tested rated each other on a five 
point scale on how much loss to the group s effectiveness would be 


TABLE 4 


Correlations Between Esteem and Three Objective 
Measures of Successful Leadership 


Measures of Esteem 
Tactical Officer’s Evaluation 
Esteem by Peers 


Successful Leadership 
Public Private Relative 


.14* -15 

W - 30 *' 


•• p Z .01 
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incurred if a particular member left the group. A members self- 
rating provided a measure of his self-esteem. The average rating 
others in his group assigned him provided the measure of esteem 
by peers. 

Low positive correlations, again higher where members were low 
in motivation, were found between the three leadership measures 
and the Tactical Officers* ratings. The correlations were lower than 
commonly found when observers’ ratings of successful leadership 
in initially leaderless discussions have been compared with Tactical 
Officers’ ratings (6). Part of the difference may be due to the lower 
reliability of the objective data used here. 

Esteem-by-peers was significantly related (.28, .31, and .36) for all 
255 subjects to the three measures of successful leadership. They 
also showed the inverse relation with motivation. Thus, the cor- 
responding correlations (not shown) for only those subjects of low 
motivation were .32, .42, and .54. The results are consistent with 
earlier positive correlations between esteem and the objective meas- 
ures of public and private success in leadership among 95 night 
school students (10). 

Again, the findings suggest that the leadership measures have 
validity as such. Of the three measures, relative success as a leader 
seems most valid as judged by its higher relation with esteem-by- 
others. 

Rated vs. Actual Success as a Leader 

At the end of the test, each member of a group checked items 
about whether or not every other member had attempted to moti- 
vate others and had not been ignored. The items were: aroused the 
interest of^ others; talked about the importance of group success; 
put o tliers suggestions into operation; changed desires of others; 
made others feel free to take part; inspired others; increased general 
level of activity; encouraged others to participate; and supported 
others. 

These items were intermixed on the check list with a set of items 
concerning success in initiating structure: made the situation clear 
to others; gave others the information they wanted; proposed 
courses of actions others wanted; helped the interactions among 
others; made plans acceptable to others; coordinated others’ ac- 
tjvities; and offered new solutions acceptable to others. (Sec Stog- 
ddl and Coons (67) for a detailed discussion of initiating structure). 

The average number of motivation items checked by all other 
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members as descriptive of a subjects behavior provided the sub- 
ject s subjective score as a motivator Similarly, the average number 
of initiation items checked for a subject by the others provided a 
subjective estimate of the member’s success as an initiator Average 
correlations of 24, 31, and 28 were obtamed between rated success 
as a motivator and the tliree measures of actual success as a leader 
Average correlations of .25, 33, and 30 were found between rated 
success as an initiator and the three measures of actual success 
High motivation reduced the relation for relative success as a leader 
but not for the other objective measures 

Attempted vs Successful Leadership 
Attempted leadership, measured by the average time in seconds 
a subject spent talking during each discussion, exhibited reliabilities 
of 71, 91, and 92 with decreasing motivation (11) In order to be 
successful as a leader, a member of a group must attempt leadership 
A positive correlation is expected between the two independently 
obtamed measurements if both are truly measuring attempted and 
successful leaderslup respectively 
Significant correlations of 17, 15, and 28 were found for the 
255 subjects Again, probably because of higher reliability among 
those of lower motivation, shghtly higher correlations of 19, 19, 
and 33 were obtamed with the 60 subjects of lower motivation 

Construct Validity of Measures of Successful Leadership 

Thus, we compared the construct validity of the measures of 
successful leadership by examining the correlations between the 
measures and the abihty of 255 subjects, their esteem, their rated 
success as leaders, and their attempted leadership 

The initial accuracy and intelligence of the subjects predicted to 
some extent their success as leaders, as expected, if the leadership 
measures were truly measuring leadership Esteem of members by 
their peers in the group discussions and by their tactical officers in 
ROTC were also positively related with the leadership measures, 
as expected Rated success correlated significantly with actual suc- 
cess as a leader and as expected, those with higher measured success 
as leaders, attempted more leadership 

Generally, these correlations were higher when subjects were 
lower in motivation This conformed to the fact that the reliability 
of the leadership measures was higher when motivation was °' v 5 r 
The correlations were also higher generally for relative success 
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leadership than for the absolute measures, public and private suc- 
cess as leaders. 2 


Testing Hypotheses About Leadership 


The purpose behind developing the objective measures of success- 
ful leadership was to test by suitable experiments a variety of 
hypotheses about leadership generated by a theory of leadership I 
have constructed. Here are some of the results: 

The theory suggested and results indicated that more successful 
leadership was displayed in 51 groups as problems grew more 
difficult regardless of other factors. 

According to the theory, ineffective groups should show more 
change subsequently in who exhibits successful leadership. A differ- 
ent outcome emerged in an empirical test with these measurements. 
All groups became or remained effective as long as they did not 
change loaders “in midstream.” The results suggest that it may be 
more important for groups to reach early agreement on who shall 
lead regardless of whom they agree upon. 

A third analysis of related data tested the proposition that the 
increased effectiveness of a group is related positively to the amount 
of successful leadership that occurs. These results emerged mainly 
where all members were equal in control and were more motivated. 

Our findings in these studies support a variety of other proposi- 
tions concerning successful leadership. Many of these are w ell- 
documented by other investigators examining the same issues with 
ui techniques. Others are not. Some are “obvious,” “reason- 
a ° comm on -sense; others are not Some seem immediately 

applicable; others are facts to be filed for future reference. 

Further Evaluation 


Our method for measuring successful leadership is relatively simple 
compared to other similar objective techniques. For example, in 
one such similar method (75), individuals were measured operating 
alone in their response to the autokinetic phenomenon. Then /- 
*f^ rc r suc fessivc combinations of paired members 

and those found significantly different from each other %vere tested 
in pairs. Leadership occurred if one member did not change, yet 
members were significantly apart initially, but were not significantly 
apart finally. Wlulc the definition of leadership used is almost 

total" 1 ,Cp “' “ ^ vAdu, <J lie Ikcc ou.«a, refer 
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identical with the one described in this paper, the method of meas- 
urement appears more complex and expensive 

Some other advantages of our method of studying behavior m 
groups by measuring the correlations of opinion before and after 
interaction include the fact that the scores relate immediately to 
definitions of the theory of leadership and the relationships found 
among the measurements may be used to examine hypotheses 
generated from this theory Also, the measures are continuous, and 
can be defined in algebraic notation Moreover, each trial, provid 
ing a single measure, is short and self contained, permitting ap- 
plication of repeated measurement designs Again, the procedure 
is widely generalized in that the problems presented to the groups 
can be drawn from almost any type of subject matter requiring 
decision making or the making of ranked judgments The measure- 
ments, m turn, while using widely varying content for problem- 
solving, will remam directly comparable Subsequently, the outcome 
based on the relations among the measures yield generalizations of 
significance m the study of group phenomena, as such 

The group interaction studied need not necessarily mvolve oral 
discussion or even face to face contact All communications among 
members could be by written messages or by any devised symbol 
system without any loss of the effectiveness of the technique 

The data lend themselves to both digital and analog high speed 
computer analysis It is possible to proceed directly from the data 
collection session to actual machine processing without any inter- 
mediate clerical work. 

Yet, the process is ^artificial,” or unnatural and restrictive But for 
various purposes it may be worth the loss m ‘ naturalness However, 
the procedure is not like the Bales’ response classifying technique 
which can be used to study groups operating under everyday con 
ditions The method can only be applied as a testing situation to 
such natural groups The natural groups can be studied where the 
method is introduced to them as a screening examination or a team 
work test 
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IX 


An Actuarial Approach 
to Clinical Judgment * 

William A. Hunt 
Northwestern University 


During the past century the psy- 
chological approach to judgment has taken two divergent trends, 
with the adherents of each often in open conflict. One of these 
approaches, firmly associated with the experimental tradition, and 
producing orderly, repetitive data that lend themselves to nomo- 
thetic treatment I shall call the actuarial . The other, firmly an- 
chored in clinical practice, and producing individual data, highly 
unique in their nature and lending themselves to idiograpbic treat- 
ment I shall call the intuitive. Rather than accept the current view 
of these as qualitatively different and irreconcilable, I shall take 
the position that they are merely the opposite poles of a rough 
continuum, a quantitative continuum marked by the clarity and 
specificity with which the stimuli are defined, by the degree to 
which the judgmental setting is standardized through careful con- 
trol of the known pertinent variables and the elimination o ex ane- 
ous cues, and by the provision of uniform modes of rcpor mg or 
response that lend themselves to convenient mathematical treat- 
ment. 


Actuarial Trend 

The actuarial trend in the field of judgment arises in ' 
measurement of sensory mechanisms with Weber, He tz, 

• This study derives from a larger project subsidised by jho Office ot Nasal Ho- 
search uodcr contract 7 onr-!50<ll) with Northwestern University. 
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ner, etc., and shortly develops into the full stream of classical psy- 
chophysics where today the lawful uniformities of judgment are 
being uncovered and investigated. The investigations are no longer 
merely in the traditional areas of sensation and perception but in 
the complicated fields of alfectivity, aesthetics, social attitudes, eta, 
(29) and even in the field of clinical practice as our data to follow 
will attest Information theory, decision theory, and probability 
learning have all drawn from this source. Work in this classical 
tradition typically involves the use of physical stimuli, about which 
years of investigation in the physical sciences have taught us much. 
We can accurately specify how much we want of what, and can 
reproduce the stimulus conditions subsequently with some accuracy. 
I say some accuracy because there is always sufficient error vari- 
ance demonstrable in our data to raise doubts concerning the com- 
plete identity of our repeated trials. That this margin of error or 
variability is minor and within acceptable limits does not negate 
its existence. Note also that as we move from physical materials 
to such complex stimuli as art objects, social situations, or schizo- 
phrenia verbal responses, the margin of error rises and is reflected 
in increasing variability of judgment. Yet our data remain within 
limits of communality that make it possible to treat them nozno- 
thetically rather than idiographically. 

The experimental situations in which psychophysical judgments 
are obtained are laboratory ones, highly artificial and with all known 
? x ^-ous cues carefully removed. Very explicit and meaningful 
instructions are given to the judge or observer to control his re- 
actions. Ihe categories used are clear, understandable, and com- 
mon, sue as greater or “less, “heavier” or “lighter,” or the simple 
numerds of some quantitative scale. Again, we may find variability 
grea er o r lesser amount as we vary our techniques but it either 
“Stable limits or we improve the technique. By 
of measures, by smoothing curves we arrive at an illusion 

or even ° F com I d ® te oommunality that the actual dispersions 
the miestinn^rvM ° r' lC raw data °^ ten belie. If we ask ourselves 
Dld subsequent trials duplicate the conditions of the 

cLiulv n f ’w r anSWer must be - "Probably not exactly, but suffi- 
auentLl dnt-i W0 fssume the repetitiveness necessary for se- 
quent*! data upon which to base stat&ical predictions.” 
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Intuitive Trend in Judcment 


The intuitive trend m judgment has as long a history as the 
actuarial trend albeit not as scientifically respectable a one It 
comes down to us through the Germanic CcisteswissenscJiafthch 
approach, through such cultural historians as Dilthey and Spengler, 
and through Spranger, the student of personality, and his concept 
of Vcrstchetx or understanding It culminates today m clinical psy- 
chology and psychiatry m what we call clinical intuition, but what 
I would prefer to label clinical judgment Here we find the clinician 
as judge faced with stimuli that are not clearly defined (What is 
schizophrenia and how much is a lot of it?), that cannot easily be 
controlled and reproduced, and hence raise questions as to whether 
the communahties exist between trials that permit us to assume 
the repetitiveness necessary for sequential data and statistical pre- 
diction 


The judgmental situation is difficult to control and extraneous 
variables intrude as the clinician makes his judgment, at one time 
of a patient m an open social situation in a hospital ward, at an 
other time m the relatively restricted environment of an examining 
room or office Nor are the categories of report clear and specific 
They may be m the vague nosological terms of one of our current 
diagnostic systems, or they may be couched in such general terms 
as “suicidal nsk” or “assaultive ” They may even be such gener 
statements as “severe anxiety springing from oedipal problems or 
such specific ones as 4 this patient should not be allowed to view t e 
film, The Three Faces of Eve,' at the present stage in her treat 
ment" Yet communahties often may be teased out, and higi y 
complicated stimulus situations may yield sufficiently relia e a a 
for usable predictions, even with the relatively unsatis actorv ca e 
gones of classical Kraepehman nomenclature, and a re a ive y u 
developed (compared to the physical sciences) 
psychiatry to aid us m specifying and clarifying the symp om g 
ical behaviors which may serve as stimuli t . , r 

At every point where we even approach the exac 
ficity and control of stimulus, judgmental setting, 311 ^ „ 

of report which are typical of experimental psyc °P . 

munahties appear in our clinical data, and e * begins 
sequential repetitiveness necessary for statistic P DOrt some 
to show itself There is both logic and data, an mdement 

of this shortly, to support my position that the chmeal judgn^t 
is qualitatively related to the psychophysical judgm , 
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differences are ones of quantity rather than kind, that they are 
contrasting in amount rather than conflicting in nature, and that 
the clinical judgment is a culturally and educationally handicapped 
country cousin of the psychophysical judgment, and not a different 
species of being. 

The Argument Reviewed 

The argument so far runs like this: the repetitive or sequential 
data necessary for establishing probability inferences can be ob- 
tained from the clinical situation. Careful examination will show 
the possibility of locating and controlling communalities in stim- 
ulus, setting, and report so that judgments may be repeated under 
like conditions and sequential data may be obtained. Thus a loose 
class of symptomatic behaviors in a patient observed by a clinician 
whose training has some uniformity with that of other clinicians 
and whose judgments are made in an at least partially controlled 
observational setting may produce reports in terms of some diagnos- 
tic category on the basis of which valid predictions can be made 
concerning the patient’s future behavior. These results can be 
duplicated on subsequent occasions with the same and even other 
clinicians. The less possible it is to duplicate the essential conditions 
(which is another way of saying the more unique each judgmental 
setting is) the less reliable will be our predictions, but clinical prac- 
tice contains many judgmental situations in which the common 
aspects far outweigh the unique ones and in which meaningful 
sequential data may be obtained. 

While agreeing that many clinical settings are marked by com- 
mon and duplicable elements, one can still argue that genuinely 
unique situations arise where it is impossible to repeat the situation 
and hence to establish actuarial weightings to guide our predictions. 
This is the idiographic position in the imographic-nomotheb'c con- 
troversy, and it is stoutly maintained by many clinicians who would 
claim that the major characteristic of clinical practice is that cli- 
nicians encounter completely unique patients with completely 
unique developmental histories in completely unique environments, 
necessitating a completely unique prediction which, by definition 
of its uniqueness, is not amenable to actuarial treatment. I doubt 
the frequency of such occurrences, but let us accept them as pos- 
sibilities. 

With no communalities apparently involved, such a unique sit- 
uation cannot be approached actuarially. Does this deprive us of 
all opportunity of deriving probability weightings to guide us in 
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making predictions? Before answering this question let me state 
that I think any act of human judgment is m part unique, even in 
psychophysics, although m tins latter field we can reduce the 
uniqueness and diminish the error variance in our measures by 
careful cultivation and control of communalities of stimulus, task 
setting, and report The uniqueness is then minimized and becomes 
statistically unimportant, but it remains, as any one who has spent 
hours in making psychophysical observations and m processmg 
psychophysical data can testify But we have deliberately chosen 
an example which admits no communality of stimulus, task setting, 
or report Arc we then deprived of any chance of deriving prob- 
ability weightings in this admittedly (but probably purely hypo- 
thetically) unique situation? By no means Our solution is to trans- 
fer the actuarial locus to the clinician himself 
If the occurrence of such unique situations is as common as the 
adherents of the idiographic approach would have us believe, the 
chances of any clinician meeting a single one is practically ml 
He will meet many of them Thus, while each prediction may be 
unique in itself and therefore inaccessible per se to probability 
estimates, the clinician himself furnishes a common repetitive ele- 
ment in the judgmental situation We can evaluate the success 
of each individual prediction and arrive at an over all actuarial 
expression of how often any clinician has been correct m a past 
series of umque predictions The probability weighting which re- 
sults can then be transferred to any future unique prediction for 
estimating its probable correctness If clinician A has beeii right 
9 times out of 10 in 10 past unique predictions, we can then use 
this “9 out of 10” as a weight to infer the chances that his next 
individual prediction will be correct We may then extend our ref- 
erence class from clinician A, to clinicians A-Z, and even to all 
clinicians, or narrow it to clinician A in one specially defined situ- 
ation, etc , thus achieving probability weightings of increasing 
applicability And as we derive such probability weightings, we 
shall undoubtedly begin to discover previously unrecognized com- 
munalities existing elsewhere m the situation The actuaria ecus 
may then be shifted to these, once they have been recognized New 
diagnostic categories, further understanding of the variable involve 
m the process of judgment itself, and the development o sui a 
categories of report will all add predictive power in tune Mean- 
while, transfenng the actuarial locus to the clinician reI ^ 0 '' 
semantic stumbling block that does much to impede clinical p g 


> today 
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The Clinician as an Instrument 

In logical terms what we have done is to set the clinician up as 
a reference class from winch we can derive the sequential obser- 
vations necessary for an inference of probability. When we apply 
this probability interpretation to the single case, we are using a 
“transfer of meaning” Such transfers are justified by expediency 
for the purpose of action. As Beichenbach says, “The frequency 
interpretation provides merely a substitute for the probability of 
a single case, the choice of the substitute depends on our state of 
knowledge, inasmuch as we have to look for the narrowest reference 
class for which reliable statistics are available But these qualifi- 
cations do not represent any serious obstacles to the frequency 
interpretation, they merely portray the actual procedure used in 
all applications of statistics to individual cases” (27) 

I was groping toward such a solution in 1946 when I said, “We 
should consider the individual clinician as a clinical instrument, 
and study and evaluate lus performance exactly as we study and 
evaluate a test” (11) In 1951 1 dealt with the “ldiographie dilemma” 
that rises out of the application of probability theory to a single 
case and suggested, ‘The justification of a good clinician depends 
not upon his success with any single case, but upon his overall bat- 
ting average over a penod of tune and with a number of cases, 
this situation probability theory can handle” (9) In 1955 some 
of these pnnciplcs were used in interpreting the rationale of psy- 
c uatric selection (8) They have been treated at such length here, 
not merely to furnish the developmental background for the ex- 
perimental material to be presented later, but because of the vital 
importance of the question in the ideological and practical develop- 
ment of clinical psychology If one accepts a sharp, qualitative 
distinction between the idiographic and the homothetic on the 
basis of the inaccessibility of the first to statistical treatment, and 
en re egates clinical judgment to the idiographic, the scientifically 
clinician is encouraged to neglect the experimental m- 
a 01 \ ° f a baS1C cbnJcal phenomenon The non-scien tificafiy 
trJhn.nnn m l cis l n K enc ouraged m the carefree practice of a happy 
smep * necessitates no evaluative fori or afterthought 

since it hes outside the realm of empirical validation 
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Idiocrapiiic Versus Nomothetic 

Tlie most lucid and inclusive discussion of this ldiographic nomo- 
thetic controversy to date appears in Paul E Meelils classic mono- 
graph of 1954, Clinical versus Statistical Prediction (26) As both 
an accomplished statistician and accomplished clinician Meehl 
writes with an intensity of feeling that testifies that Ins booh is no 
dry scholarly exercise but the working through of a vital personal 
issue He concludes soundly, roundly, and with convincing logic 
that clinical prediction should be based upon actuarial procedures 
From Ins logic there would seem to me to be no escape, and it is 
indeed the position taken m tins paper The only alternative would 
be a flight into mysticism which would be disastrous for clinical 
psychology as either a “behavioral science” or a “healing art” 

With lus logic then, I can find no argument With some of the 
content which he builds into Ins logical structure, I would demur 
He could have made Ins case equally well without buttressing it 
by what seems at times an unjustified demeaning of the actuarial 
potentialities of clinical judgment itself By opposing clinical judg 
ment (intuition if you will) to such well developed actuarial tech 
niques as the MMPI, he at tunes comes dangerously close to deny 
mg any actuarial potentiahty for the judgmental approach As my 
colleague Dr Roy Hamlin, a persistent researcher m tins field of 
clinical judgment (and a thoroughly objective one), so pithdy re 
marked during an APA round table discussion, Meehl not only 
makes a straw man of the clinician but takes his pants away as welL 

It is unfortunate that Meehl has chosen to buttress his case for 
actuarial procedures by using comparative data between test 
and clinical procedures in the selection field The evaluative 
criteria in selection are very complex and the simple probability 
statistic which he applies is not suitable to this complexity As 
Cronbach and Gleser (5) have recently pomted out in their Psy- 
chological Tests and Personnel Decisions,” the statistics of games 
or decision theory are more suitable for sophisticated evaluation in 
this field I have no doubt that actuarial test procedures, where 
adequate and available ultimately would still prove more e cien 
than chnical ones but the race might well be a closer one wer ® a 
different evaluative statistic to be applied To illustrate tie pr 
lem, we have some data never published in comp ete t 

admittedly involving some inference and extrapolation 
that in one situation both methods ran neck and nec m P 
centage of failures identified, but differed widely in e P 
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tive rate (31). There is no simple technique for equating “hits" 
and “false positives" in evaluation without the employment of ex- 
ceedingly complicated decision statistics, which unfortunately were 
only slowly being understood and adapted at the time. 

Moreover, by limiting himself to studies in which both methods 
are used, while tightening the logical force of his argument, Meehl 
inevitably neglects any instance in which clinical prediction is used 
independently with some success. I can think of two such studies 
from our own work (30, 21). Let me state again, my faith in the 
actuarial procedure is unshaken. I merely feel that a more sympa- 
thetic treatment of clinical procedures might have uncovered evi- 
dence that they are more efficient than Meehl would lead his read- 
ers to believe. 

Finally, there is what seems to me to be a serious flaw in the 
design of MeehFs comparative study. In most cases he compares 
selected tests against unselected clinicians. He compares carefully 
developed actuarial procedures, into whose standardization and 
validation for the specific situation in which they are applied tre- 
mendous effort has gone, with clinicians about whose training and 
efficiency little or nothing is known and who probably have had 
little intensive training for the specific predictive task into which 
they have been thrown. It would be equally fair to compare the 
predictive performance of a group of highly selected and specially 
trained clinicians with that of the early Bernreuter Personality In- 
ventory. The results might not be invidious to clinicians. 

I often wonder what the results would have been if the time, 


energy, and financing that have gone into the development of the 
MMPI, had been put into the training of a group of clinical psy- 
chologists or psychiatrists in interviewing toward the goals of a 
specific predictive situation. These criticisms do not destroy the 
logic of the actuarial position. They do not invalidate any conclu- 
sion that in the specific situations Meehl mentions the current pre- 
dictive efficiency of actuarial test techniques is greater than that 
o uie clinical techniques used. They do cast some honest doubt 
on any sweeping conclusion that there is no hope for the 
uture predictive potential of clinical judgment, particularly if it 
proposes^ 1 * ac ^ uar ^ study and development such as this pap cr 

if Meehl would raise any serious objection to what I have 
saia. As he himself remarks, M I would defend simultaneously (and, 
Li, 0 ?*’ C01 J S1StcnUy ) lhe two Propositions that (1) there are some 
behavior phenomena which cannot be best studied in the labora- 
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tory, at least with any confidence in one’s extrapolations, and (2) 
until some quantification, at least frequency counts and contingency 
measures, is applied to clinical evidence, we can have very little 
confidence m our claims M With this I am in total agreement Per- 
haps our difference, if there is one, could be summarized by saying 
that I feel that Mcchl views the exercise of clinical judgment as a 
necessary evil, whereas I view it as a fascinating phenomenon with 
a genuine predictive potential 


Present Position and Purpose 

Were I to state my own position, it would be tins 

1 There are some behavioral phenomena which cannot best be 
studied under the controlled laboratory conditions necessary for 
tlie development of sophisticated actuarial techniques such as ob- 
jective test devices, and consequently such sophisticated actuarial 
devices are not currently available for use in studying such phenom- 
ena 

2 Clinical judgment furnishes a teclmique for the study of many 
of these behaviors that is necessary, suitable, and promising 

3 Such clinical judgment must and can be subjected to some 
actuarial evaluation and developed and improved along actuarial 
lines 

4 Tlie use of clinical judgment is a necessary and inevitable 
preliminary step in our technical evolution toward an actuarial goal 
(10) Again, I suspect that Paul Meehl would be sympathetic with 
this formulation 

Before presenting data which we currently are obtaining from 
our actuarial approach to clinical judgment, let me state that the 
motivation for our program stems not from any a priori f°g ic< 'i 
formulation, but from the hard reality of certam experimental find- 
mgs obtained in our earlier investigation of the efficacy or psychi- 
atric selection m the U S Navy in World War II This program 
and its rationale has been presented elsewhere (8), but I should 
hke here to review briefly several of the studies from which our 
interest in clinical judgment arose. 


Early Studies 


A large scale vahdational study 
ropsychiatric screening program 
succesful in reducmg subsequent 


of the efficacy of the Navys neu- 
furmslied evidence that it was 
psychiatric attrition during serv- 
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ice (19, 18). Since clinical judgment in the form of individual clin- 
ical diagnoses and predictions from interview impressions, etc., 
formed an important part of the screening technique, it was im- 
possible to conceive of how the general program could be valid if 
such an integral part of it were invalid. In two subsequent studies 
we investigated clinical prediction specifically. In the first of these 
a clinician graded a group of 944 seamen suspected of maladjust- 
ment. A rough quantitative scale based on the categories mild, 
moderate, and severe was used, and subsequent attrition in each 
category confirmed the quantitative judgment of the clinician (30). 
This was a pre-planned predictive experiment, and not set up as a 
post hoc study. The clinician was selected on the grounds of his 
general professional competence it is true, but we were not inter- 
ested in what anyone, irrespective of ability or training, could do 
but in the performance of a reputable clinician. After all, this is 
the same kind of selection that is used in choosing the objective 
tests used in selection experiments. 

Encouraged by these results, we then made a study of the pre- 
dictive value of certain diagnostic categories such as neurosis, 
schizoid personality, asocial psychopathy, etc., relating the original 
diagnosis to subsequent behavior during service. Many clinicians 
furnished the judgments used this time. While the study has never 
attracted wide attention, it is, I believe, one of the few experimental 
studies in which actual correlates in common social behavior were 
shown for the diagnostic categories involved. Neurotics were shown 
to be less of a disciplinary problem than a normal control group, 
but to have a preponderant incidence of alcoholic behavior when 
they did get into trouble. Schizoid personalities showed no inci- 
dence of alcoholic difficulty but were a leave and insubordination 
problem. The psychopaths were, of course, outstanding as a source 
of disciplinary difficulty, but were particularly noticeable for in- 
subordination (21). Replication confirmed these findings (20). Not 
only were we encouraged by this evidence of clinical competence, 
nut by tiie further evidence of some reliability and validity in these 
common diagnostic categories. The presence of such behaviors 
promises an objective observational basis on which valid clinical 
judgments may be based. 


Follow-up and Objectives 

Two further studies supported our belief that the clinical judg- 
ment that we had been working with was a valid, lawful phenom- 
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enon. We reasoned that the diagnostic judgment should be easier 
the more maladjusted the interviewees were, and that, since the 
basis for the judgments must rest on observable behavior, the more 
maladjusted the interviewed group was the more symptomatic be 
haviors the clinician would note during lus interview Both of these 
hypotheses were confirmed (23, 22) By now we were convinced 
that clinical judgments could be valid and reliable, that clinical 
judgment itself was a lawful orderly process open to experimental 
analysis, and that its study and development offered the promise 
of supplying a useful clinical teclinique m those situations where 
more formal actuarial techniques, such as objective tests, were not 
available To be useful, however, we were convinced that it must 
be approached and developed as an actuarial technique As a con 
sequence, we began the experimental program I shall now discuss 
To assist in clarifying the objectives of our program, let me say 
that we envisage it as a combination of basic and applied researc , 
basic in that we wish to understand the nature of clinical judgmen , 
applied in that we hope our understanding will further its usetui 
potential as a tool in clinical practice In any research sulncien y 
well planned and executed to merit the name it is difncu t to sep- 
arate “pure** and “applied” aspects I cannot imagine any basic 
research which does not have implications for the con o an 
manipulation of man and lus environment A narrow invo ve 
m practicing technique for technique s sake, a type o sc uzoi , 
cissistic laboratory play activity, may and does result m e 
counting the pickets in a picket fence expenmenta on w y 

be temporarily accepted, but whose absurdity sooner 
recognized Nor can I imagine any applied pro em> which 

alyzed and thoughtfully approached, the practica i D j 16 _ 

does not add something to our knowledge o e situation 

nomenon involved There may be expenmen ers j ata 

who refuse to look beyond the immediate sign c met } 10 dology 
but this is a human limitation and not one o *ese fertilization 

This blending of basic and applied interest, 
and mutual facilitation between approaches, m of miIl 

better illustrated than in the flourishing P° s P j problems has 
tary research, where the attempt to answer where 

necessitated the furtherance of fundame to°new practical 

m turn the fundamental knowledge o arne q ua ]ita- 

apphcations Agam, one senses a qu^n r some parameters 

tive difference, and I have been cas g j Two seem to 

on which this could be expressed if not measured 
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me to be of use in expressmg the quantitative relation between 
tbc basic and applied approaches These are the generalization 
potential of any group of scientific data, and their resistance to 

scientific obsolescence , 

The more widely we can generahze from our specific laboratory 
situation to other diverse fields and the longer our findings remain 
useful without fundamental revision (resistance to obsolescence;, 
the more basic our research is Using a military illustration, it is 
the difference between spending thousands of dollars to obtain 
information about the fundamental processes of perception, which 
can then be applied widely and over a long period of time to many 
different problems of recognition and decision making, and spend- 
ing thousands of dollars on a specific recognition problem involving 
a particular viewing scope, when the results have little implication 
for other weapons systems, and where rapid technological progress 
may make the particular weapons system outmoded before the re- 
search can even be published There are many similar problems 
in training and education, involving both training methods and 
their evaluation (testing) 

Method 


In our work we have attempted a happy compromise between 
findings that would have some immediate application m professional 
practice and yet have some basic value in terms of potential gen- 
eralization to other situations and some resistance to the obsoles- 
cence attributable to foreseeable changes m psychological interest 
and practice We have used stimulus materials which are relatively 
clear, and easily duphcable The judgmental situation is realistic, 
convenient, and fairly controllable Our observers are representa 
live of their class The categories of response are understandable, 
easy to use, and lend themselves to mathematical treatment The 
total setting attempts some of the experimental rigor of the psy- 
chophysical situation, without losmg the semblance of actual clinical 
practice 

The stimuli used were verbal responses to intelligence test items, 
specifically items from vocabulary and comprehension tests These 
can be repeated with exactitude and are easy to present The eval- 
uation of such materials for qualitative cues of personality disorder 
is a common expenence in clinical practice The stimuli were pre- 
sented on mimeographed sheets with space provided for recording 
t ic evaluation The task m essence involves the clinician sitting at 
ins desk and evaluating test responses m a reasonable approxima- 
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tion to normal working conditions. This also frees us from the neces- 
sity of calling each clinician into a fixed laboratory situation, thus 
increasing the number of clinicians available and the geographical 
range from which they may be drawn. We concentrated on re- 
sponses to vocabulary items since vocabulary is a brief, easily ad- 
ministered, popular type of test of wide usefulness which has stood 
up well through the years and promises to be valuable for some 
years to come. It may be scored objectively for purposes of com- 
parison, and offers a relatively fertile sample of the thinking proc- 
esses of the subject. 

The judges were all well trained clinicians. The original min- 
imal criteria for selection were a Ph.D. and four years of full time 
professional experience. Most of them were well beyond this min- 
imum. While our 48 judges must remain anonymous, they are recog- 
inzed, well established professional people, drawn from all over 
the country, and include many of the leaders in the profession 
today. Our naive subjects were all drawn from undergraduate stu- 
dents in psychology at Northwestern University. We must remem- 
ber, however, that their naivete was relative. They are certainly 
more intelligent and more sophisticated psychologically than the 
average man. 

The judgments were in terms of a 7 or 9 point quantitative scale, 
running from the minimum to the maximum of the phenomenon 
being evaluated. Such scales are understandable, easy to handle, 
and lend themselves to mathematical treatment. We must remem- 
ber, however, that they are rough ordinal scales and as yet we have 
made no attempt to meet such problems as equal-interval steps, 
etc - Standardized instructions were used with every attempt being 
made to render them clear and unambiguous. We will set forth 
below the results of our studies in rough chronological order, and 
len discuss their implications. 


First Results 

As sometimes happens, our first approach to the problem was 
Werly ambitious and com mvliat disastrous. Reasoning that the 


,"*/ ambitious and somewhat disaftrous. Keasoning that the 
■ e cts of stimulus context which produce judgmental distortions 
classical psychophysics should also appear in clinical ju gme , 
.off attempted to find anchoring effects in a situation involving 
atings _ r T . f»Tcinn ^vhibited in pa- 


m attempted to find anchoring effects in a situation mvuiv^ 
tings of the amount of schizophrenic confusion exhibited nip 
;n j responses to vocabulary test items (1). Professional clinician^ 
aduate students in c lini cal psychology, and naive undergr 
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ates were used as subjects Anchoring effects were not demons- 
trable, the reliability of the clinicians’ judgments vaned widely, and 
to our embarrassment reliability as measured by the standard de- 
viation of the mean judgments was inversely related to level of 
training This last was particularly upsetting 

Fortunately, interviews with some of our subjects and careful 
inspection of our data indicated that two opposite context effects, 
contrast and assimilation, might be cancelling one another out in 
the statistical analysis of our data, and there were indications that 
our instructions might be too general and allow too much oppor- 
tunity for individual interpretation by our trained clinicians As 
Amhoff has aptly stated it, “When dealing with experts in a judg- 
mental situation, the task should be well defined and the criteria 
set forth clearly Otherwise the riches of knowledge may yield 
confusion rather than clarity” (1) 

Partly as a result of our frustration and partly to illustrate some 
of the pitfalls inherent m such work, we wrote a brief, humorous 
note on “Reliability, Chance, and Fantasy m Inter-Judge Agree 
ment among Clinicians” (14), which was intended as a tongue in 
cheek pedagogical device Fortunately it was accepted as such m 
amused approval by many of our colleagues, but unfortunately, 
was taken scnously by others It taught us a lesson of caution about 
interjecting subtle humor into the deadly serious business of science 
Despite the discouraging beginning, we remained confident of 
the promise of our approach, and decided to continue with a less 
ambitious, better planned, progressively analytic approach to the 
problem The first step we deemed necessary was to answer the 
basic question of the reliability of clinical judgment If we could 
establish this, we could then proceed to investigate some of the more 
complex phenomena of judgment 


Further Work 

Accordingly, Amhoff and I (13) selected 50 of the most reliable 
vocabulary test responses from a pool standardized for his previous 
study To these we added 50 comprehensive test responses also 
gathered from schizophrenic patients These were rated by 16 pro- 
fessional clinicians m the CIncago area A 7-point scale was used, 
and \vc wrote improved instructions designed to eliminate the pre- 
vious ambiguities Judgment was on the basis of “how schizophrenic 
each ot these responses is” We defined reliability both as agree- 
ment of each judge with the group, and as individual consistency 
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upon retesting The original ratings were repeated after intervals 
of 3 and 18 months 

Agreement of the judges with the group on vocabulary responses 
showed r’s ranging from 73 to 92 for the first rating, 69 to 95 on 
the second, and 81 to 92 on the third For comprehension items 
these were 64 to 88, 66 to 86, and 71 to 89 respectively Test- 
retest rs between the first and second ratings for each clinician 
ran from 65 to 91 for vocabulary, and 68 to 90 for comprehen 
sion Since all the clinicians were from the Chicago area we ran 
a cross validational group of 16 other clinicians from all areas of 
the United States A single rating only was obtained from this group 
The judge group r’s ran from 59 to 92 for vocabulary and 63 to 
90 for comprehension Correlations between the mean values as 
signed each stimulus by the first group (original rating) and the 
new group were 93 for vocabulary and 96 for comprehension 
These reliabilities are quite high and led us to believe not only 
that such ratings are reliable for clinical use, but that they can 
be used for experimental purposes in the further investigation of 
the nature of judgment 

AmhofFs finding that reliability, as defined by the standard de- 
viation of the mean stimulus values, was in mverse relation to 
amount of professional training still bothered us Consequently 
with the aid of Nelson Jones and Mrs Hunt, I made several studies 
of the judgments given by naive undergraduate students using our 
new and improved instructions (17) While the mean va ue was 
not significantly different from that of the trained clinicians tne 
standard deviation of the mean was significantly greater indicating 
that contrary to Amhoffs original results clinical experience is 
acting to increase the reliability of the professional clinicians juag 
ments The correlation between the mean stimulus va ues ° 
groups was 88, however, indicating a high degree of agreeme 
between their ratings 

Increasing the Gap Between Clinicians and Naive Raters 

Encouraged by the high reliabilities obtained thus a ^ 

I (15) extended our investigation beyond the simp e ~ ^ 

“schizophrenic” to include more subtle dimensions P rfll 
pects of schizophrenic tlunkmg Our P rcd '^‘‘j" stl]1 be OTt I )in 
of trained clinicians would dimmish in rehabil y . trained 

acceptable limits, while the gap in performance more dlls . 

and naive judges would mcrease as the ratings 
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cult The stimuli were the 50 vocabulary responses previously 
used. The new dimensions were potential intelligence (defined 
not m terms of the correctness of the response, but in terms of the 
potential intelligence level indicated by it), communicability (a 
dimension of pnvate public meaning), and concreteness (the clas- 
sical concrete abstract aspect of schizophrenic thinking) The ex- 
perienced clinicians were 31 of our previous subjects all of whom 
had rated the words earlier for “schizophrenic” All 31 made rat- 
ings of “potential intelligence,” 15 of “communicability,” and 16 
of “concrete-abstract ” The naive subjects were a new group °* 
undergraduates, 30 of whom judged on the schizophrenic dimen- 
sion, 30 on intelligence, and 15 each on communicability and con- 
crete abstract 

Reliability was defined as each judge's agreement with the group 
For the trained clinicians the r’s for schizophrenia ranged from 
63 to 94 with a median of 88, for potential intelligence from 22 
to 83 with a median of 70, for communicability from 70 to 91 
with median of 80, for concrete-abstract from 38 to 71 with a 
median of 55 Reliability dropped but was still within respectable 
limits For the undergraduates the r s were schizophrenia — 50 to 
89, median 72, for intelligence 14 to 83, median 61, for com- 
municability 39 to 85, median 75, for concrete abstract — 54 to 
73, median 60 Negative values were contributed by only three 
subjects Again, while not quite as reliable, the undergraduates ran 
fairly close m performance to the clinicians The r*s between the 
mean value assigned each stimulus by the two groups were 95 for 
schizophrenia, 83 for intelligence, 94 for communicability, but — 43 
for concrete abstract The picture of disagreement became sharper 
where wc studied the pattern of interrelationships between our 
dimensions 

Since we were using the same 50 stimulus words in each judg- 
mental setting it became possible to compare the relation between 
our four attributes of “scliizophrema,” “potential intelligence,” “com- 
municability" and “concrete abstract” through rank order correla- 
tions based on the mean value of the stimuli for the four types of 
ratings For our experienced clinicians the dimension of schizo- 
phrenia was highly negatively correlated with communicability, 
not related to intelligence, and minimally related to a tendency to 
concreteness There was no relation between intelligence and com- 
municability, but a lugh one between intelligence and abstraction- 
The r between communicability and abstraction is significant but 
ow. With tlic possible exception of schizophrenia and comm uni ca- 
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bility it seems obvious that our clinicians are differentiating between 
the dimensions 

The picture of schizophrenic think ing presented by these findings 
is in accord with conventional thinking except for the low relation 
between schizophrenia and the tendency to concreteness Concrete 
thinking classically has been attributed to the schizophrenic Our 
results, however, are in accord with those of McGaughran and 
Moran (25) obtamed m a sorting experiment where they found a 
tendency to personal, noncommumcable thinking more typical of 
the schizophrenic than a tendency either to concreteness or abstrac- 
tion. 

For the naive undergraduates, however, the results are quite 
different Everything is highly related, either positively or nega 
tively, to everything else The evidence would indicate that they 
are not distinguishing between the scales, and possibly are falling 
back upon some common denominator as a basis for all their judg- 
ments In any case, the findings confirm the variability picture, and 
J t seems evident that while our naive subjects perform rather well 
when rating for broad, general aspects of disorder, they fall down 
when given more subtle dimensions to use It is at this point that 
the superiority of our trained clinicians becomes apparent 


Additional Categorization 

In our latest investigation, Jones and I (16) have introduced a 
different category of disorder with a different dimension of judg- 
ment and a return to the use of responses on comprehension tes 
items Using 50 such items drawn from Wechsler Bellevue materials 
administered to a group of Naval disciplinary cases, 15 of our pool 
of clinicians were asked to rate on the basis of the asocial tendency 
revealed in the responses The responses were selected to represcn 
a wide range of such tendency Ratings also were obtained irom 

h,gh rehab^ 

phrenic materials For the clinicians the rs ranged . °£ 

92 to a low of 64 with a median of 82 A random splitting ot 
the group gave an r of 94 For the undergraduates tl le rang 
from 88 to 51 with a median of 72, again quite hig i between 
the clinicians The split group r was 93 The agreement ^veen 
clinicians and undergraduates, based upon the mean \ _ o ^ 

the stimuli, was 91 The difference between die men ^ 

^ the stimuli for the two groups was not sigmfican , » 
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to our previous findings, was the difference in standard deviations 
significant. 

The fact that high reliability or good inter-judge agreement con- 
tinues to appear as we extend our reference groups by adding new 
dimensions and a new type of disorder indicates some possibility o 
generalizaing about clinical judgment. Wc cannot, of course, con- 
clude that all clinicians are reliable in all judgmental situations, bu 
within the limits of our rating technique and those situations we 
have used, clinical judgment begins to emerge as a reliable tech- 
nique which can be applied in evaluative situations. Incidental y 
it is of interest to note that what individual differences do appear 
seem to be limited to specific situations. We find no evidence o 
consistently “bad” judges appearing among our clinicians. 


Practical Applications 

Having described our motivation as coming from a blending of 
basic and applied interests, it is fitting at this point to suggest some 
of the practical applications that might follow from it. One im- 
mediate application is the use of scales as teaching devices. The 
presenting of clinical materials scaled to show the orderly quantita- 
tive progress of the dimension in question would seem to be a better 
method of pedagogical communication than the disorderly hodge- 
podge in which such illustrative material is usually presented. vVe 
have published such material for schizophrenic thinking as is ex- 
hibited in both vocabulary and comprehension responses (12). Oui 
future program envisages further refinement of the above materials 
plus the production of scales for the other dimensions we have 
investigated. As yet we have no experimental data on their peda- 
gogical efficacy, but rough observation of their use indicates thal 
they show promise for teaching purposes. 

The value of obtaining reliable scaled judgments of significanl 
test behavior which previously was only open to broad, qualitative 
interpretation is obvious. Through such a technique the previously 
subjective and non-quantitative becomes objective and quantitative 
Not only is communication more clear and more precise, but die 
possibilities of statistical treatment are increased. It is even possible 
that future tests and test items might be selected against the criter- 
ion of their adaptability to such scaling techniques. 

# The practicing clinician will think of many possible specific adapta- 
tions. One which we hope to test in the near future is the use ol 
judgments of potential intelligence to provide an internal scattei 
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measure with any vocabulary test The responses on such a test 
could be scored in two ways first, with the usual objective stencil 
or rules for correct” and incorrect to give a measure of functional 
intelligence, second, with scaled values on the dimension of poten 
tial intelligence as we have used it to give a measure of the sub- 
ject s previous (or potential, if the disorder is thought of as reversible) 
level of ability The discrepancy between these two measures 
should provide some estimate of the current intellectual deficit 
One of tile most intriguing possibilities stems from the ability of 
our naive judges to perform in a fashion closely comparable to our 
trained clinicians provided we use broad, general dimensions This 
confirms what common sense and experience has already demon- 
strated, that even the man in the street can recognize deviant be- 
havior (and this without the advantages of training on our scales!) 
If tins recognition can be turned mto reliable scale judgments, the 
use of such scales by nursing aids and ancillary ward personnel has 
possibilities Such scales might be helpful in the courts, in social 
work situations, and certainly m the military services It might even 
be possible to turn English professors mto diagnosticians, as well 
as scholars! 


Present Concerns 

As we speak of the future, we reveal one of the great deficiencies 
of our work at present We have dealt so far with reliability and 
have not touched diagnostic validity Some, I think, can be inferred 
from our reliabilities, but irrespective of current squabbles concern 
the identity or difference of the meanings of the terms reliability 
and validity, validity does pomt to a realm of practical diagnostic 
applicability with which we have not dealt Particularly pertinent 
is the fact that the stimuli used to date have been carefully pre- 
selected to offer a representative range of patient responses. ia 
will happen to the reliability of judgment when clinicians exercise it 
on unselected test responses? Will actual, run of thc-mill tests m 
themselves offer sufficiently rich materials for such scaling tecn- 
njques? Our techniques will be improved as our understanding ox 

we have explored some of tire factors causing distortions o j g 

men t m the hope that through their understanding 

tna ) unprov e clinical performance That there are p ic c j 

mon all acts of judgment has long been a chcnsM 

mine (7) The specific area we have chosen to investigate 
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effect of the context upon the judged value of the stimulus. Am- 
holFs original study (1) was of anchoring effects, which are part : ot 
the general context. His failure to find these effects did not dis- 
courage us, and we resolved to return to the problem. Fortunately 
at this time the interest of my colleague. Professor Donald T. Camp- 
bell, was aroused and his collaboration secured. He suggested a 
more sophisticated design that seemed better suited to our purposes 
than the one we had been using. With the assistance of Mrs. Nan 
A. Lewis, we set to work. 

Tlie stimuli were our schizophrenic verbal responses which were 
rated for the amount of schizoplirenic thinking involved, using a 
9-point scale. Northwestern undergraduates provided our subjects. 
In essence, our experimental situation consisted of the repeated 
presentation of stimuli of median value against a limited context 
of stimuli from either the upper or lower halves of the scale, a 
transition to the opposite context followed. Context effect was 
measured by changes in the judged value of the median stimuli. 
Our hypothesis was confirmed; median stimuli were judged higher 
when presented in a context of low stimuli and lower when presented 
in a high stimuli context (2). For reasons not pertinent here, in- 
volving some general problems of psychophysical technique, we 
repeated the problem with tones as stimuli, and confirmed our 
previous findings (4). We have demonstrated that these effects can 
be used as a criterion for the evaluation of different types of scales 

(3). 

It was still necessary to bridge the gap between naive under- 
graduates and trained clinicians, and Jones undertook this problem 
(24). In an experiment involving 48 professional clinicians, lo 
clinical psychology graduate student trainees, and 16 naive under- 
graduates, Jones was able not only to demonstrate these effects, but 
also their dependence upon professional experience and previous 
experience in a similar rating situation. In view of the fact that 
Jones design did not involve the repeated stimulation of the study 
by Campbell, Hunt, and Lewis, it is understandable that the effects 
were not strong. One of Campbell’s students. Dr. Marshall Sega", 
has since demonstrated that such context effects can be demon- 
strated with social attitudes (28). 

It thus seems fair to conclude not only that clinical judgment in 
terms of our scaling techniques is sufficiently reliable at present for 
some practical adaptation, but that further understanding of the 
judgmental situation wall lead to greater possibilities in its control, 
and hence to more accurate judgments ana judges. The importance 
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of training the judge has recently been recognized in psychophysics 
itself. In an article reflecting much of our own orientation in the 
clinical field, Engen and Tulunay say, “The use of unpracticed and 
naive Os as instruments of precision, e.g., in measuring sensory 
magnitudes on a ratio scale, may not be unlike the use of un- 
calibrated physical instruments. With such instruments, constant 
errors, such as those associated with context, may remain unknown. 
It may, however, be feasible to “calibrate” the human O by giving 
him experience with the various types of psychophysical judgments 
and their sources of bias” (6). If such an august discipline as psy- 
chophysics can benefit from such a program of improvement, may 
not the lowly field of clinical psychology aspire to the same goal? At 
the very least we are in good company! 
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Increasing Clinical Efficiency 

Starke R Hathaway 
University of Minnesota 


The chief contribution of psy- 
chology to clinical efficiency is the use of psychometric devices to 
more quickly and accurately effect clinical decisions Efficiency 10 
clinical testing does not very clearly signify anything by way of 
subject matter, the connotation suggests maximum usefulness for 
minimum cost and effort This treatment of that connotative field 
will be a broad one within two important limiting conditions Cne 
limit is suggested by the word "clinical” The points to follow are 
intended to apply most clearly to routine work with clients or 
patients who are tested as part of the evaluation of mental handicap 
or disorder but exclusive of special research purposes A more in- 
definite limit will be the primary concern with personalty tests and 
testing rather than with interests, aptitudes, or intellect 

Wiiy Do We Use Tests? 

If one were to go from clinician to clinician conscientiously at- 
tempting to estimate the items contributing to the use of tests, 
many of the most frequently stressed points would not be amenable 
to statistical or other objective evaluation of test efficiency F° r 
example, some clinical psychologists use tests as a fairly effective 
s r ing point to let them participate m the diagnosis or psychody- 
namicformulation of the problems of the patient This psychologist, 
pro a y ortunately for the development of his science, more often 
tc a need for a real or seeming instrumentation in contrast to 
192 
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simple opinion from experience, before he can contribute m the 
team development of formulation 
The Rorschach and other projective devices have probably gamed 
a great deal of their popularity because of their adaptability to the 
satisfaction of this need In many cases, a candid psychologist will 
sum the whole of his value argument for a test by saying that he 
hnds it to be prolific m source material for psychodynamic state 
ments (psychological causes) about patients These statements are 
not only valued in the clinical setting in which many psychologists 
work, but m a majority of cases there is some real criticism by 
clinical peers if test data are not presented as background for dy- 
namic evaluations Although these statements rest in some way 
npon data m the test, they are not usually evoked by objective 
aspects of the test as die technical requirement of a test is usually 
stated, and they mostly depend upon professional hearsay or at most 
upon weak data from validity study We are often guilty of accept 
nig some halo effect weight to our subjectively derived observations 
of a patient because the basic observations were made m the course 
of administration of an objective test That is, our statements gam 
1Q staff creditability and m our own belief in them to the extent 
that they are associated with other informational items properly 
termed objective and vahdated test data 
An allied utility from using tests is the relief of the clinician from 
the responsibility of making a judgment that cannot be based upon 
any identifiable procedure or objective datum The test, even if it 
were completely invalid for the situation, nevertheless constitutes a 
decision making device The need m the psychologist is comparable 
to the reason we toss a com We try to decide certain ambivalent 
situations by shifting the responsibility to an outside event A tes 
indication may have more relationship to the decision than a coin 
would, but often the real value lies m the provision of a sign u sea 
to dissolve indecision Complicated test utilities such as these are 
often respectable and real, smce they are tools for use m clinical 
areas that cannot be better handled Of course, we will give up 
such props as we develop more appropriate and valid tests 
At the outset it is important to differentiate between e , 

°f a test or test situation when it is being used m a *? wr . "test - 
°ur more formal understanding of die meaning o e 
a, id tlie quite proper observations that go along with ie , 

tion of a test and which are, perhaps, better thoug^ o J 

from a controlled interview situation The Rorscha P 
good illustration of this contrast It is a test whe y 
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and profiles, but it is merely a partially controlled interview when 
the clinician uses unscored and unstandardized data to describe 
the patient's personality, no matter how much authority and ex- 
perience he behind the interpretations. 

Clinical Versus Psychometric Data 

The confusion of psychometric methods with interviews and with 
other non-metnc methods of gaining information may, m part, have 
started from an over-reaction that is historically characteristic of 
psychology The movement toward acceptance of the use of psy- 
chometric data in practical clinical application occurred against 
a strong tradition m the use of interviews and oral examinations 
Psychiatry was deeply involved with these subjective methods and, 
m general, psychologists found it hard to compete It was not very 
effective to pit a test score against the word of the experienced 
clinical man The struggle for recognition of the value of psy- 
chometrics may have caused over-reaction against subjective 
chmcal evaluations to the point where modem psychologists have 
neglected the potentialities of the interview and, particularly, the 
possibility of developing more objective information from the in- 
terview by using methods that control the situation 
If the psychometric aspects of Rorschach technique continue to 
show poor efficiency, we must not over-react again and lose the 
value of the method for lack of the flexibility to change our concept 
of die method Until forced out of the position by pressure to 
appear more scientific, the best clinicians using the Rorschach 
depended very little upon formal test methods, such as scoring and 
the formalizing of clinical signs The current Q-techmque en- 
thusiasm, which illustrates the possibility for measurement derived 
rom subjectively summarized chmcal information, is relieving p res ' 
siue toward the inappropriate use of objective psychometric meth 
Similarly promising is the increasing development of validated 
chmcal statements derived from objective test profiles that are 
considered as complex patterns or codes rather than individual 
scales For example, social conformity and continuation m school 
are related to MMPI codes showing high scale 5 or 4 (5) for children 
who have the specified profiles This kind of validity data com 
p ctes the link between objective test profiles and the signs that are 
validated from projective devices 
To a certain extent, experienced clinicians have found there is a 
gain m chmcal efficiency from a simultaneous objective test and 
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interview It is hard to teach this skill, and encouragement of the 
practice often leads to corruption of the objective test evidence 
In summary, it is probable that we are beginning a period of rapid 
development m efficiency by the development of “cook books” with 
the scheme exemplified by Meehl (9) and developed by Drake (2) 
The distinctive character of this movement is the development of 
cluneal generalizations about personality from objective test data 
that are in some way synthesized This synthesis may be effected 
by use of a code (3) or some other clerical system to skim the va- 
lidity cream from a profile of objective personality scale scores. 


Who Shall Administer the Tests? 

Tests for c linica l use can be divided into those that must be 
administered by a c lini cal psychologist and those that can be ad- 
ministered by a psychometrist The psychometrist can have one or 
two years of graduate training or be an in service trained person 
chosen for the abihty to get cooperation and observe general rules 
Examples of tests that use professional time are the WAIS, tlie 
Rorschach, the TAT, and a long list of others including many of the 
tests for mental deficit Some of these tests require nearly half a day 
for administration, scoring, and evaluation Examples °* e * * pT 
can be administered by a psychometrist are the Shipley, e > 

Sentence Completion, and the Porteus All of these require skilled 
interpretation, but the completed profile or answer recor s ar 
starting pomt The use of these latter tests is much more efficient, 
at least in terms of professional time investment i 

With the shortage of clinical psychologists and w^h the gener l 
resistance of the fully accredited psychologist to doing routine test, 
»ng, tlie tests that require skilled professional time , other 

or scoring will need to piovide much more o justifiable 

value than will the second type of test before their use s jumble 
Progress toward efficiency should increasing y re g c jp; cien cy 
as the chnical psychologists feel more responnbUUyfo raB Bc.ency 
m their routines At present, one rarely » j ■££ du . 

cost in professional time as part of tlie v i i ■ y sccm t0 f ce l 

cussion of new or old tests Some psyc 10 g “ . ca tcd an j that 

that tlie processes of testuig should e p n || )cn1 

values upoifthe dffi "between tests in professional tunc for 
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administration and scoring. There is a danger that we will go too 
far, not use individual tests when they should be used, and, even 
more likely, not give enough time to an interview with the patient 
when we are provided with the test data by someone else. As a 
policy, for example, routine MMPl’s can be handled in large num- 
bers by one experienced reader who makes a few screening judg- 
ments on each profile put before him. But for diagnostic and other 
significant decisions, an interview with tire patient is imperative. 
For efficiency, such interviews by the clinician should be approached 
with a maximum of completed test profiles and other test informa- 
tion in hand. They become directed and efficient checks upon 
the peculiar significance of the test indications as applied to the 
patient and upon the probable validity of the test data. 

It is apparent by now that a great gain in clinical efficiency can 
be made when we face the fact that psychology must accept leader- 
ship in expanding the use of psychometricians. It is inexcusable 
for a clinic to employ psychologists at the Diplomate level and for 
these to spend much of their time scoring or administering tests 
when equal or more information could be obtained by a psycho- 
metrician. But when no psychometric or clerical help is available, 
it is more reprehensible for the clinical psychologist not to use 
tests if he has been hired as a clinical psychologist to bring his 
special skills to the clinical team. Acceptance of this position would 
embarrass many individual psychologists and some large clinical 
programs when Ph. D. level psychologists are exclusively used with 
no provision for psychometricians or clerical workers. 

Test Efficiency and Test Statistics 
Test efficiency has most often been treated by approaches that 
^8S es t practical statistical formulations with parameters that can 
be estimated. The following development briefly reviews the prim 
ciple factors in such evaluations to suggest that practical clinical 
efficiency often very loosely relates to orderly statistics. 

Cronbach and Meehl (1) have expanded the subject of test validity 
trom the standpoint of research and development of tests. Test va- 
1 Y an u reliability of tests are certainly significant items in clinical 
e lciency. Statistical equations and fairly precise arguments that 
e . cffici ency in terms of estimated parameters depend upon 

critical items such as the placement of the cutting point, the amount 
o over p, and the base rate of occurrence of the critical events, 
treatment of test efficiency, as indicated in a four-fold table of 
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test hits and misses in proper cross validation, is a well recognized 
procedure Unfortunately, as Meehl and Rosen (8) repeatedly state, 
this procedure is rarely presented properly, and the data are not 
available for it in the case of most of the personality tests Estima- 
tions of the base rate for ordinary applications together with con 
sideration of the data for advantageous cuttmg points are even 
rarer in our literature The clinician usually works with very little 
of this information even if he is able to state his local problems 


properly 

On the whole, as Meehl and Rosen show, application of the 
efficiency formulas where we can estimate the necessary parameters 
tends toward discouraging conclusions As methods for desirably 
changing the predictive probability of statements about patients and 
their problems, many of the tests we use are unpractically weak, 
some actually lower the predictive accuracy because of their im 
proper cuttmg scores We should be much freer to not make state- 
ments from test data. Effectively this means that we can administer 
a test and refuse to use a proportion of the scores because they fa 
m mdetermmate areas Such a practice will permit strong state- 
ments about a few of the persons tested even when a weak test is 
used A testing program could sometimes be justified when only 
one among a hundred scores can be used rationally 

When efficiency of a test is based upon the data in the our o 
table, or overlapping frequency distributions, or upon some re a on 
ship of these values to the base rate, one must deci e w 1 a 
ciency weight should be given to every one of the types 
to, errorsf and mdetermmacies The problem is rather simple if 
only one category is considered significant For exmnp e > .P 
select true positives as the exclusive measure of t per 

large and stable supply population, that test giving At. ouch 
cent in the true positive cell would be the most e cie n0 

some situations approach this simple case, there P ^ as the 
completely consistent examples Some other evaluation 

false positive cases will always influence the y m tcst 

>n the practical situation Most often a combrna cent 
validity categories must be considered Among • „ c 

°f false negatives is especially likely to be cnti wlt i 1 3 brain 

ate very unhappy if we fail to identify °“°^ negat „ e for tumor 
tumor because the appropriate test indicator , nf i JC3 tions By 

We can, however, accept a few more false posi w j| 

contrast, if a test is intended to predict w * ) vcr - embar- 

severely delinquent, the false positive cases 
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rassing. People are quite tolerant of favorable predictions (false 
negatives) even if the prediction is wrong. Efficient evaluations 
closely depend upon the special demands put upon the test by 
the applied situation and are difficult to generalize. 

Fortunately, the samples on which we usually work are enriched 
samples; they have a larger base rate than does the general popula- 
tion. Using a test of schizophrenia for hospital patients, we know 
that the rate of the criterion condition will be higher than among 
non-patients. Real suicidal intent is rare in general populations, but 
among hospital patients who look depressed, it is many times more 
common (10). Our tests are fortunately used so that they operate 
as successive hurdles in probability. This can help the efficiency by 
pushing the base rate of the critical event upwards. Arguments 
such as these mitigate the dismal picture of clinical test efficiency 
that is suggested by candid consideration of the statistical facts 
available tends. 

In summary, statistically evaluated test efficiencies can be applied 
when proper parameters are available or can be estimated. These 
efficiencies must be rationalized for the special test application 
according to the base rate and the importance of each of the various 
kinds of hit and error. We should be more hard headed about the 
significance of these equations and tables, since it can often be 
demonstrated that no way of treating the data would justify routine 
use of some of our tests, and in other cases it would become ap- 
parent that a test is useful only under restricted conditions. 

In tlie long run, inefficiency in the use of clinical resources is 
undesirable, and the psychologist should be foremost among those 
who are concerned. There can be no doubt that a great deal of 
the published material on tests, and much of the routine application 
of tests, would appear weak if we choose to be critical. This is true 
even when more effective cutting points are selected or when other 
special conditions are favorable for increasing efficiency. Optimal 
adjustment of cutting points or identification of the sharpest test 
use can not usually be achieved in ordinary clinical practice. 

Why Diagnose? 

Clinical psychologists have been the leaders in criticisms of 
current dynamic formulation and diagnostic systems and have de- 
scribed new factors, tests, and patterns as suggestions for improve- 
ment. Neither the new traits and diagnoses they provide nor the 
new tests have been established as convincing heuristic or predictive 
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devices One must admit that the diagnoses for functional mental 
disorders do not have the desirable properties of good medical 
diagnoses 

The most popular test we use today, as well as our other diag- 
nostic procedures, are devoted to the production of diagnoses that 
will approximately fit the professional culture We use the tests 
that will provide the proper statistics to indicate the rates of the 
traditional mental illnesses, that show the types of patients handled, 
or that provide a basis for disability or legal status There are rurelv 
any crucial elements except these cultural ones involved m deciding 
whether a patient is schizophrenic or a severe neurotic. 

In a practical application we do not always want to predict w 10 
would be diagnosed schizophrenic by a clinic We really want to 
know which persons will become, or presently are, ill wi i t ic so 
far incompletely defined illness If there is a morbid umtx ui m 0 
to evoke the diagnosis of schizophrenia, it is obvious u 
with this disorder are often given some other diagnosis or 
normal This is apparent from the fact that clinicians anddm.^ 
do not show high reliability (7) What " e im -, ^ j n ;_ 

With a test is the true illness, schizophrenia, . 
nostic system can be thought of as another test wi ® } t j ie 

efficiency Either the clinical diagnosis or the test coul 
better indicator. , , , > j , 1 . (j 1L , test and 

For a sophisticated appraisal, we 1(cms tint mil permit 

the clinic against multiple symptoms We j me too much dis- 
a relative decision about the effim® , ,, enuilg to assume 

regarded this aspect of be- del eloped for agrccn.c.lt 

eitlier that it is good enough tor a tes t ablv limiting die va- 

witli clnucal diagnoses (which means g^ement with diag- 

hdity), or that some arb.trap of tliese two unhappy 

noses mvahdates a test It se r ornlcr unless there is no 

alternatives, it is better to choose themmiia 

value m making climral Tn^has brohen°lm Kracpebman svstem 

It is interestmg that no o 1 “ hcllcs taught to all clinical ps>- 
Good arguments against it construcUve ideas hive been 

cliology students Similar y, g practice. Apparently no 

developed and partly estabhs bed n0 has really 

one has bad prestige ®“°“S ’ tem Even Adolph Mc)cr and 

offered a convincingly better u^an a few small pomls The new 

Freud have changed no o official are never more than 

classifications that radiologists have tried to hold aloof and un- 
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sullied by the use of their tests for supporting a classification that 
is often exemplified by majority or prestige vote in a case con- 
ference; but as clinical psychology accepts more real clinical re- 
sponsibility, this aloof position is untenable. Undeniably the 
practicing majority of clinical peers and the common professional 
language are changing very little. It is in one way understandable 
enough that psychologists have caused so little change. The psy- 
chologists suffer from a plethora of prophets with different theses 
and these have little in common except the call for a change. 

Summarizing the foregoing argument, it could be that some ob- 
jective clinical tests have been unjustly accused of low validity and 
efficiency. If present-day clinical practice is largely based upon 
arbitrary diagnoses and dynamic formulations rather than upon 
theoretically or scientifically demonstrated indications, test effi- 
ciency may now be nearly as high as the predictable upper limit for 
such arbitrary symptom conglomerates. This thought is a more 
cheerful one, and the clinical literature is providing some support 
for it. More and more useful validity items are appearing to sup- 
port an increasing faith in the objective approaches to personality* 

What Is Needed 

What we need today is routine practice and what we can now 
get from our testing and clinical studies is uniformity in the diag- 
nostic reactions of clinicians to the traditional selection of diagnostic 
symptoms. It is desirable to increase the agreement among clini- 
cians and climes in applying the accepted diagnostic terms and in 
making other ordinary clinical decisions. The diagnostic agreement 
vve now achieve is very likely based upon nothing more than train- 
ing drills that focus attention upon arbitrary symptomatic signs 
that have no basic meaning. A similar situation exists in psychody- 
namic formulations. Few of these formulations are validated as 
indicators of prcdictively useful behavior sequences or better ther- 
apy so as to make one formulation superior to another. Here again, 
custom and clinical prestige are more involved than is any estab- 
lished fact about the patient’s diagnosis or treatment 

If we conclude that the diagnoses and dynamic formulations of 
present clinical practice derive more from the cultural uniformity 
of the professional clinician than from the nature of the patients 
* ness, it becomes reasonable to increase test efficiency by using 
t ic test to increase the uniformity of clinical decisions. A reduction 
of the argument for test use to so low a theoretical level will in- 
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furfate some psychologists, but it provides for a real security and 
useful contribution to those who are clinically operating in the cul- 
re. Starting with this point of view, test devices have high effi- 
ciency when they provide maximum uniformity and coherence in 
lagnostic and psychodynamic formulations together with the usual 
psychometric qualities of objectivity and reliability. In effect, we 
would gain in efficiency if we abandoned the attempt to validate 
personality tests by increasing the agreement with diagnoses or 
clinical estimations beyond a certain point. Advances in validity, 
once an approximation to criteria of clinical usage is reached, should 
come more in the form of construct development and in improving 
the purity of the generalizations from test data. 

In this way of thinking, the efficiency of certain tests will steadily 
r * se as the clinical j‘argon— which develops from the objective data 
provided in the test signs or scores— becomes more widely used 
the universality of patient-descriptive language is advanced by 
this. This development could make it more possible for different 
clinics to select more nearly replicable groups of patients for special 
treatment or study. Objective tests with standard conditions are a 
necessary prelude to the development of better diagnostic and thera- 
peutic science. Projective devices and expert diagnosticians are not 
a substitute for this. 


A Case in Point 

Among numerous illustrations which could be selected, the Pd 
score of the MMPI provides a good example. The Pd score was 
originally aimed at producing agreement with clinical start di- 
agnoses. The criterion groups were made up of various pa* 1 ®* 1 * 
diagnosed psychopathic personality. No one has ever thoug i a 
tins diagnosis signified a constant disease or even a stable pattern 
pf symptoms. But even before die Pd score was derived, it Jiau 
been suggested, on clinical and psychometric data (4), that mere 
was a smaller group of persons among the psychopaths who were 
jnuch more similar in symptoms, behavior, and odier 1 ems. , 
have gradually emerged as a useful type and today the 
Pathic deviate or asocial sociopath is one of the c eare p 
logical constructs. The long suspected defect ... emohona func 
«on within this sub-group appears to have been sufotanhated by 
Lykken’s (6) finding of deficient autonomic fans-ety?) “ndj,tto 
ability. If such new indicators serve toh..kfam.l.ara„duscm. 
clinical data to more discriminative psychomctr.c scales, test cm 
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ciency will increase although the agreement with the older diag- 
nosis may even decrease. , , , , 

We would, of course, prefer that decisions from tests should rest 
upon determined validities and established improvement over base 
rate predictions. The average clinical situation is far from 
at present, and if one insisted upon practicing according to the idea , 
one would find it hard to use personality tests. 

Many clinical psychologists have a tendency to quit using psy- 
chometric data, possibly because they become uncomfortable wi 
the suspicion or evidence that their test data are not improving 
accuracy of decision. It may also be that they are impressed wi 
the fact that a majority of the psychiatrists, and probably most o 
the clinical psychologists, in private practice, do not use tests, 
laziness is the real reason a psychologist does not use tests, it is har 
to prove. When the clinical psychologist does not use tests, he as 
ceased to identify himself with the scientific future of his profession 
However, if diagnostic or other contributions of a testing procedure 
do not sufficiendy allay the insecurity of the clinician, do not pro- 
vide him with useful methods of procedure, or do not protect hun 
from professional danger in proportion to the energy and time tba 
he must invest, he can be expected to abandon the procedure. 

We must steadily improve efficiency or lose out in the speci 
field. I wonder what would survive of the present test programs, 
if the work of the clinical psychologists were evaluated for efficiency 
by considering the expenditure of professional time and energy r 1 
view of the new and useful information provided. I suspect tna 
better than 40 per cent of all the routine clinical testing now done 
by clinical psychologists would be immediately abandoned, y 
course there is comfort in the fact that the field of psychologica 
medicine is full of inefficiency. Even admitting the limitations, 
testing stands much above most other clinical procedures. If %vc 
are in danger of becoming anxiously pessimistic, we need only to 
review the solid standing of attitude, interest, and ability evalua- 
tions. These achievements show the basis for expectation that wc 
will develop more efficiency. 
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Future Impact of Psychological 
Theory on Personality 
Assessments ° 

James G. Miller 
University of Michigan 


^Michael Todd's moving picture 
Around the World in 80 Days" included in addition to the primary 
stars a number of other outstanding actors, stars in their own^ right* 
who consented to play brief vignettes called "cameo parts. ^ And 
in such a^ sense we can all say, not “I Am a Camera,” but “I Am 
a Cameo. We are living, behaving systems surrounded by a skin 
which protrudes into space-time in an engaging variety of patterns. 
The skin is the boundary which separates us from the environment. 
This surrounding world is the subject-matter of a number of en- 
vironmental sciences, most of which are physical or biological. In 
making their observations these disciplines employ the dimensions 
of the centimeter-gram-second system and its derivatives, like tem- 
perature. In addition they are beginning to employ in some situa- 
k°*Tu a ne% y s0r ^ dimension, the units of information theory. 

The various environmental sciences derive a unity from this join 
use of dimensions and the classical conceptual system which unaer- 
es them. Upon crossing the boundary of the skin, however, tilings 
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change. Occasionally the units of the environmental sciences are 
used, but much more characteristically the dimensions measured 
in behavioral and personality assessment are entirely different and 
quite unrelated 

The psychology of personality usually measures traits with names 
now used technically after having originally been parts of common 
speech Tins fact was dramatized a few years ago when Allport 
^d Odbert (1) went through an ordinary dictionary and compiled 
u list of words used to describe personality or behavior, hoping 
to S et a full sampling of all possible characteristics 
The way someone trained in physics or engineering would tech- 
nically describe the action of an automobile and the way a person- 
a hty psychologist would describe the behavior of a person are 
strikingly different The engmeer would say that a car has a weight 
°1 3400 pounds and that its previous acceleration at full t “ r ° ttl ® 
Wa s 24 ft per second Now, however, it has a small pebble of 3/lo 
UK:n average diameter m the gasoline feedhne of 1/ 4 inch diameter 
As a consequence the rate of gasoline flow at maximum throttle has 
be en dimmished from 3 gallons to 1 quart per hour, so the maximal 
deceleration is now only 6 ft per second If the car were a human 
ein g, a student of personahty would say that, although it ia a 
somatotomc body type, it had formerly rated four on a scale ot 
j 6rtne ^ s This trait was now diminished to a rating of one, an 
uggishness had increased from two to five , 

is striking difference arises partly from the divergences 
a of 60 c^ le Wa y s we measure the action of human beings aa 
of nonliving systems A person customarily serves as an observ 
mJ ater acts °f ot her persons, without intervening o 1® 

-i truments to quantify the observations Environment^ SCI j e 


« aments to quantify the observations isnvironmeu^ — - 
iracteristieally use precisely calibrated instruments ° a 

therefore they eliminate the human error involved m r ’ g 

°me observational activities— for example, pattern rec g 
must be done by human beings because as yet we do not 1 ^ 
^equate instruments for them Pattern recognition-of face , ^ 
an J 0r se q u ences, similar designs— is essential m all science m 

verbal or written communications is anothe 
Jr* 2* human rater must be employed Conton^g ^ 
hem made to discover ways to replace uncahbra _ rec0 g. 

gs with more precise instruments in such fields as pa used, 
tC? content analysis And as long as human beings ^ 
shL !i ng i UlStlC and other difficulties involved m rating ^ t0 
0uld he recognized and everything possible must b 
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correct for them. An effort in tills direction is “A Glossary of Some 
Terms Used in the Objective Science of Behavior” by Verplanck 
(9), compiled admittedly in the Skinnerian frame of reference but 
broadly valuable for clarifying terminology. 

Recently I glanced through a typical personality assessment in 
what might fairly be called an average well-written clinical case 
history. It was based on the administration of a large battery of 
the best-known personality tests. I selected a few of the chief per- 
sonality traits measured in the evaluation, with titles derived from 
ordinary English prose. Such dimensions, I believe, should be re- 
placed by more objective variables. Now I will list a few of those 
words which we have all used or seen in such situations and sug- 
gest how they could be made more precise by m inimizin g oppor- 
tunity for human error and wherever possible employing centimeter, 
gram, second units, information units, and derivatives: 

Ambivalent. If this trait is demonstrated in overt behavior, a 
statement concerning the type of physical vacillation between two 
goal objects, its rate, and duration might well be substituted. If & 
refers to symbolic behavior, the statement could describe in similar 
quantitative fashion the amount of vacillation which occurs among 
various symbol categories identified by content analysis. 

Dominant. If this concerns overt behavior, it might be quantified 
in the way the peck order of animals has been. Dominant sym- 
bolic behavior can be measured by some apparatus like the inter- 
action chronograph, which records the number of times in an hour 
the subject interrupts someone else and the number of seconds of 
overlap between his speech and that of others with whom he is 
conversing. 

Weak ego. The usual vague clinical impressionism might be 
replaced by a precise measure of the number of decisions made by 
the individual for himself as compared with those made for him 
by others during a standard period of time. The daringness of his 
decision-making strategies might be evaluated by a test developed 
in the framework of game theory. 

Low need achievement. This could probably be quantified by 
measuring on a utility scale of some sort (like money) the rewards 
a person requires to carry out certain standard acts. The more 
such rewards arc required, the lower his need for achievement 

High need aggression. The number of hostile overt or symbolic 
acts of an individual can be counted during a standard period of 
time under standard circumstances. 

Schizoid. The percentage of char acteris tically abnormal meta- 
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phors or grammatical constructions in a standard sample of talk 
can be measured. So can the number of times that hallucinations 
are reported or that the patient appears to be hallucinating. 

Moderate word fluency. The number of bits per minute that the 
individual can read aloud or silently with correct responses con- 
cerning comprehension can be measured. 

High average performance I.Q. Commonly we measure this by 
total score from various items or subtests in a test like the Wechsler- 
Bellevue. The relationships of these subtests one to another are 
not well understood. Instead we can administer tests involving 
a known amount of complexity measurable in bits, determining the 
rates at which a person can solve problems of different known de- 
grees of complexity. Such a method can be applied with equal 
objectivity to human beings and to lower animals. 

There is more or less general agreement among psychiatric cli- 
nicians concerning the standard form of diagnostic evaluation; cer- 
tain published procedures for determining mental status are quite 
widely followed. But most of the items of such evaluations are 
qualitative rather than quantitative, or only quantitative in the 
roughest sense. For example, the number of digits which can be 
repeated forward and backward, or a series of informational ques- 
tions of graded difficulty is given as basis for a rough estimate of 
effective I.Q. Various efforts have been made by Wittenbom, 
et al (10), Malamud and Sands (4), and others, to develop more 
quantitative scales of psychiatric status— instruments which can be 
used not only when the patient is first examined, but repeatedly, 
to indicate day-to-day changes in some quantitative fashion. Even 
though these scales are in many ways improvements over the usual 
clinical method, they suffer from the psychometric shortcomings o 
other rating scales. These shortcomings include: disagreement Irom 
rater to rater concerning the definition of the variable being ra c , 
inter-rater difference in the amount of experience in rating ie van 
able and in the sorts of patients remembered as seen previous y 
(the reference population); lack of a known zero point an o ^9 
intervals between various steps in the scale; nonlincan y 
variable; and lack of orthogonality between variables. f , 

Perhaps tire factor analytic method for dctcmmm>g the funda 
mental psychological dimensions is the best availa c • 

Certainly this procedure has effectively sm.pl.ficd d.e dim«u.or^ 
alities ofthe primary mental abilities, of persona >» . 

meaning. The method, however, 
of the testing instruments which provide the h 
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correct for them. An effort in tliis direction is “A Glossary of Some 
Terms Used in the Objective Science of Behavior” by Verplanck 
(9), compiled admittedly in the Skinnerian frame of reference but 
broadly valuable for clarifying terminology. 

Recently I glanced through a typical personality assessment in 
what might fairly be called an average well-written clinical case 
history. It was based on the administration of a large battery or 
the best-known personality tests. I selected a few of the chief per- 
sonality traits measured in the evaluation, with titles derived from 
ordinary English prose. Such dimensions, I believe, should be re- 
placed by more objective variables. Now I will list a few of those 
words which we have all used or seen in such situations and sug- 
gest how they could be made more precise by minimizing oppor- 
tunity for human error and wherever possible employing centimeter, 
gram, second units, information units, and derivatives: 

Ambivalent. If tliis trait is demonstrated in overt behavior, a 
statement concerning the type of physical vacillation between two 
goal objects, its rate, and duration might well be substituted. If it 
refers to symbolic behavior, the statement could describe in similar 
quantitative fashion the amount of vacillation which occurs among 
various symbol categories identified by content analysis. 

Dominant. If this concerns overt behavior, it might be quantified 
in the way the peck order of animals has been. Dominant sym- 
bolic behavior can be measured by some apparatus like the inter- 
action chronograph, which records the number of times in an hour 
the subject interrupts someone else and the number of seconds of 
overlap between his speech and that of others with whom he is 
conversing. 

Weak ego. The usual vague clinical impressionism might be 
replaced by a precise measure of the number of decisions made by 
the individual for himself as compared with those made for him 
by others during a standard period of time. The daringness of his 
decision-making strategies might be evaluated by a test developed 
in the framework of game theory. 

Low need achievement. This could probably be quantified by 
measuring on a utility scale of some sort (like money) the rewards 
a person requires to carry out certain standard acts. The more 
such rewards are required, the lower his need for achievement. 

High need aggression. The number of hostile overt or symbolic 
acts of an individual can be counted during a standard period of 
time under standard circumstances. 

Schizoid. The percentage of chara cteristically abnormal meta- 
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eluding often the limitations of the human being as rater and evalu- 
ator. Moreover, the derived factorial dimensions do not have known 
relationships to the dimensions of the environmental sciences. 

I am not necessarily committed to the dimensions of natural sci- 
ence if someone can suggest others which are equally good or 
better for measuring the interactions of the individual and his en- 
vironment and consequently for advancing the unity of physical 
and behavioral science. Lacking such an alternative set or dimen- 
sions, we have proceeded to construct a series of tests in which the 
subject reacts to some electronic or other physical apparatus. We 
measure his performance, not along classical psychological dimen- 
sions, but on the sorts of dimensions that might be used by an elec- 
tronics engineer if the subject were a component in the electronic 
system. That is, we determine his personal “transfer function” in 
this system, using C.G.S. units, derivatives, and information units. 

The Driving Battery 

One set of such test situations is our driving battery (5). These 
tests have the advantage of face validity, being widely used for 
measuring driving skills in everyday life. On the other hand, they 
perhaps do not measure “pure” behavioral variables, certain aspects 
of the scores really being artifacts of the particular apparatus being 
used— for example braking time in a driver- trainer is not the same 
tiling as a simple reaction time. 

The first piece of equipment in this driving battery is the Ameri- 
can Automobile Association’s “Auto Trainer.” This apparatus con- 
sists of two parts: the first includes all the controls of a conventional- 
shift automobile— starter button, speedometer, steering wheel, gear- 
shift lever, ignition key, and accelerator, brake, and clutch pedals; 
the second part is a treadmill-like belt about 10 feet long, which 
extends out from the front of the control unit The belt, painted 
to resemble a tortuous roadway, revolves when the controls are 
in gear, the speed being controlled by the accelerator. In our experi- 
ment, however, the apparatus was modified so that the speed could 
be set by the experimenter’s controls at a constant fast rate (equiv- 
alent to approximately 20 mph) or a slow rate (approximately 10 
mph). 

A small model car, the steering mechanism of which is controlled 
by the steering wheel of the control unit, rests on the belt, its wheels 
turning as the belt revolves, and the speed of the belt determines 
its apparent speed. The task of the subject is to steer the car so 
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that it remains in the center of the roadway painted on the belt 
A red and a green light are situated at the far end of the belt unit 
When the green light is on, the driver is to proceed, when the red 
light appears, he is to stop the car as rapidly as possible by de- 
pressing the brake 

An accuracy counter, a reaction timer, a trial timer, and speed 
controls face the experimenter at the side of the control unit, out 
of sight of the subject A foot switch with which the experimenter 
can turn on the red light is also connected to the side of the unit 
Large staples are embedded in the roadway” every 3 inches If 
the car ls kept m the center of the roadway, it makes contact with 
the staples, completing an electrical circuit and advancing the ac- 
curacy counter 1 unit The reaction timer measures in hundredths 
of a second the time elapsed between the appearance of the red 
light and the brake-pressing response 

The subjects are given trials as follows 20 revolutions of the belt 
at a fixed slow speed, 20 at a fixed fast speed, and 20 at a speed 
controlled by the subject Six reaction time determinations were 
interspersed irregularly through each of the 3 trials 

On the driving test, scores are obtained for accuracy at the fixed 
low speed, at the fixed high speed, and at the variable speed con- 
trolled by the subject The unit of measurement is the number 
of staples over which the car passes Since the staples are embedded 
m the center of the roadway, the subject has to keep the car m t ic 
middle of the road to activate the accuracy counter A time score 
is obtained, indicating the time required for each trial when the 
subject is controlling his own speed During tins phase o le 
the subject is asked to drive as rapidly and accurately as he can 
A derived score is also figured— the ratio of the difference e wee 
the accuracy score at low fixed speed and the accuracy 
subject-controlled speed, divided by the time score ^ 

accuracy ratio, winch indicates the degree to w c *P 
nficed for accuracy, or vice versa, may be mterpre e 
of judgment Reaction times for the brake pressing P 
taken while the car is being driven at low fixed speed, at fast fixed 
speed, and at variable speed Sf^diness 

The steadmess test is an adaptation o e , S12e 

Test The test panel contains a senes of holes decreasmg 
from 7/16 m to 3/16 in. The subject is asked to . rns^t ho , d 
metal stylus 1/8 in m diameter into each of * e h °' es “ f , Jl0 l e 
it there for 15 seconds without letting it touc whenever the 

The apparatus is wired so that a timer is activated whenever » 
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stylus touches the sides of the hole during the 15-second test period. 
Scores are obtained for three trials on each of the five holes, repre- 
senting the total amount of time that the stylus touched the rim of 
the hole. 

For the visual tests we employed the master model Ortho-rater 
constructed by the Bauscli and Lomb Optical Company. This de- 
vice is designed to present slides for testing various visual functions, 
with distance and illumination controlled. It consists of 2 octagonal 
slide-holding drums set inside a boxlike apparatus. A binocular 
eyepiece is located at one end of the box. One of the drums is much 
closer to the eyepiece than the other and is used for testing near 
vision; the farther drum is used for testing distant vision. The test 
slides are fastened to the drum and are easily changed by rotating 
the drum with an external handle. 

Standard Ortho-rater testing procedures are used for seven visual 
tests. Acuity is determined for both far and near vision; depth per- 
ception scores are determined for distant vision only. Vertical and 
lateral phorias for both near and far vision are also measured. 

Precise Measurement of “Pure” Behavioral Variables 

We have gone beyond the use of such lifelike measuring situ- 
ations as the driver trainer, trying to measure “pure” or isolated 
behavioral variables. On occasion it may be possible to equate 
functions of known anatomical subsystems with the separate be- 
havioral functions, but perhaps more frequently this is not possible. 

In the literature of human experimental psychology are studies 
with various apparatuses — often electronic — designed to measure 
isolated functions that have a high degree of precision and relia- 
bility. We have either adapted existing tests to purposes of per- 
sonality evaluation under normal or psychopharmaceutical stress 
conditions or made new tests of our own for special purposes. We 
have interpreted findings from these testing methods in terms of 
our general behavior systems theory approach (6). The following 
are some of the tests which Gerard and I have been using in drug 
studies. h 

The Tanner Auditory Perceptual Apparatus. The observer is seated 
in an individual booth and has in front of him four lights. The first 
light is a warning flash. The second indicates the signal interval, and 
may flash one or two times, depending on the experiment. The 
third light indicates the answer interval during which the observer 
indicates what signal he heard, or where he heard it. The fourth 
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iSfi’ 1 “ ai ® macI,,ne , answer light, during which the observer is told 

eirnlmn ‘“T t" ^ M ° l,lcr ^nes of fights The observer wears 

earphones and through them hears a constant background of noise, 
1 h “ set at an amplitude of 7 \olts Tins is turned off only 
ring practice runs When the actual run lias started, the techni- 
cian /ets the observer hear the signal without noise first, then noise 
is added, and a short practice is allowed Then 100 presentations 
arc given by the electronic equipment IBM cards are automatically 
punched recording the observers answers on each of the presenta- 


The performance is scored in terms of the descision making theory 
Or signal detection developed by Tanner and Swets (7) It is stated 
that under given experimental conditions there is one distribution 
of noise and another distribution of tone plus noise, and these 
distribution curves overlap somewhat The quantity (T is defined as 
the differences between the means of the noise and the tone plus 
noise distribution, divided by the standard deviation of the noise 
distribution This d* is measured as a function of the tone to noise 
ratio Thresholds are obtained when the observer has to rely on his 
own memory of a previous tone amplitude and circumstances when 
the previous amplitude with which comparison is made, is presented 
by the apparatus In another set of experiments judgments are 
made as to whether a second tone is lower or higher than the first 
when the observer has to remember the previous tone and when 
the previous tone is repeated for him by the apparatus The thresh 
old for Gaussian noise is also determined Such measurements have 
been made by us with subjects under several psychopharmacological 
drug stresses, and we are studying effects of the drugs on some of 
the perceptual parameters 

Krtstofferson Visual Apparatus Tins apparatus (3) employs a four 
interval, temporal forced choice psychophysical method The sub- 
ject has a sequence of trials made up of four successive time m 
tervals separated by clearly audible sounds In every trial a visual 
target of circular luminescence subtending one degree at the eye 
is superimposed on a large uniform background of moderate lumi- 
nescence at one of four time intervals Which interval is so activated 
is randomly determined This target is presented at the fixation 
pomt in a location known exactly by the subject The exposure 
duration is 0 010 seconds Fifty successive trials define a unit and 
require twelve minutes to complete After one minutes rest the 
trials can he repeated Consequently it is possible to derive tem- 
Doral activity curves as well as dosage curves in this procedure In 
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work so far carried out we have a definite indication of drug stress 
effects quantifiable in this situation 

The PSI Apparatus Members of our group have devoted more 
effort to the development of this technique and the use of it in a 
number of different situations, including those involving drugs, 
than we have to any other single test The PSI apparatus (a contrac- 
tion of “ Problem Solving Using Information ”) permits the combina- 
tion of a number of elements into various logical relationships The 
elements are represented by invariant electronic connections which 
m the latest version are transistors built mto the apparatus The 
logical relationships between these elements can be varied by means 
of a plugboard in the rear of the device A set of logical relation- 
ships between the elements constitutes a problem The connections 
determining a particular problem are wired on to a plug which is 
inserted mto the plugboard to constitute each problem. 

On the panel is a circular array of lights with corresponding push- 
buttons Each light represents an element and, depending on 
whether it is on or off, the state of the element In the center of 
the panel is a light with no pushbutton, which represents the output 
of the circular array A disc that can be placed in the center of the 
array has arrows drawn on it which show the relationships between 
tbe elements For each problem there is a different disc All rela- 
tionships are represented by arrows, and each depicted arrow stands 
for an existent relationship The relationships possible are con- 
junction, disjunction, and negation (hence also implication) The 
direction of relationships is indicated by the head of the arrows 
which indicate only the existence and the direction of the relation- 
ship, not the land of relationship For example, the arrow between 
lights 5 and 2 might mean a) that if 5 is lit, then 2 wall light, or b) 
that to light 2, it is necessary but not s uffi npnfr to light 5, or c) that 
5 prevents the lighting of 2 If there is no arrow between two 
lights, there is no relationship between them, in other words, the 
null relationship is not presented. 

In all problems the task is the same The central light, represent- 
ing the output of the network of elements, must be turned on by 
some combination or sequence of activation of three particular 
elements, which are referred to as input elements Any element or 
combination of elements can he activated by depressing the ap- 
propriate pushbuttons In order to learn how to produce the output 
of the network— to solve the problem— the subject must analyze the 
“^nslups that exist between the input elements 

Vhen an element is activated, the light representing that element 
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on the display panel goes on Three seconds later, that light goes 
out and the lights for which the preceding light represents a suffi- 
cient condition for activation go on, remain on for tliree seconds, 
and in turn go out, activating those elements for winch they are 
sufficient In this way, an ordered set of consequences of the activa 
tion of any combination of elements is presented to the subject 
The subject is free to add further activated elements while the 
consequences of previous button pushes are continuing He may 
thus observe sequences of events resulting from the activation of 
elements and the interaction of multiple sequences By trying out 
what happens when various lights or combinations of lights are on, 
the subject can uniquely determine the nature of all relationships 
m tlie network, and thus find out how to solve the problem Sub- 
jects become familiar with the apparatus and situation by means 
of comprehensive illustrative prooiems presented in detail by the 
experimenter, and solve these sample proolems before measurement 
begins Average time for familiarization and testing on the two 
problems used most frequently is about two hours altogether 
The raw material for our analysis of the problem solving process 
is the sequence of experiments that are performed— that is, the 
sequences of buttons pushed and the time at which they are pushed 
These data are recorded automatically on tape which can be ex 
amined and analyzed along many mathematically derived dimen 


sions 

The information content of the network representing any problem 
can be determined m bits With the possible exception of Ravens 
matrices (8) this is the only currently available test of mental 
abilities made up of problems that increase m complexity m com- 
parable units (bits) It differs from the classical intelligence tests 
hke the Stanford Binet and Wechsler Bellevue in which the tasks 
for various age levels or subtests are entirely different m character, 
and consequently do not have known commensurabihty of dimen 
sions The problems of the PSI apparatus increase m complexity 
from a one Bit problem (push a certain button and the oentalhght 
goes on)— the Earthworm floor"-t° a very comphcated sequence of 
button pushes-the Emstem ceiling? -by simply addmg nm 
of comparable character, representing one of the three logical re 

lataonships nature of the actions earned out during the 

By analyzing t g mt0 account then order, it is pos- 
problem solving process, tam^g ^ ^ to dle ]nform at.on 

gainedVtfsu^cct up Jtot point, and also to relate die product 
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of the process, the solution, to the information state. The raw data 
permit the quantification of a large number of variables, some of 
which seem to be of reasonable statistical independence. Some of 
these variables are power variables such as the time required for 
solution, or the number of experiments required for solution. Others 
are more validly process variables, and have consequently been 
of greater interest to us. 

This procedure puts a sort of microscope on cognitive processes, 
including memory, learning, reasoning, and information processing. 
We are carrying out factor analytic studies to learn the dimension- 
ality of the domain of 40 or more variables, which are derivative of 
CGS and information units exclusively. The factor analytic studies 
are being done both on individuals and on groups. We hope thereby 
to find ways in which individuals and groups, or normal individuals 
and individuals under stress, differ in their cognitive processes. 
We are studying the effects of drugs on such performance, both 
in diminishing cognitive functions— behavioral toxicity— and also 
improving it. Horvath, Uhr, Kelly, and Rapaport are involved in 
various aspects of this work. 

Stroud Apparatus. Foster, of our group, has conducted prelimi- 
nary studies with the Stroud apparatus. This equipment makes it 
possible to present auditory stimuli at regular intervals whose dura- 
tion can be varied by the experimenter. There are also two visual 
stimuli, left and right bulbs which glow at a fixed time after the 
auditory stimulus. This fixed time can be altered by the experi- 
menter, as can the time between the glowing of the two bulbs. If 
Strouds hypothesis is correct, when the two visual stimuli appear 
in sequence, a certain brief interval after the regular and rapid 
auditory stimulus the order of the glowing of the two lights will 
be correctly perceived no better than chance. When they glow 
about the time the next auditory stimulus is expected, their order 
will always be correctly perceived. We believe that this time 
if it exists, may well be altered by drugs or other stresses. 

The Ft Apparatus. This equipment consists of eight translucent 
knobs 2% inches in diameter mounted on 9 x 9 inch squares 
ot black plywood. Each knob contains a light. The eight are 
arranged in a semicircle around the subject. They can be moved 
toward or away from the center and are fixed at a distance where 
the radius of the semicircle is the length of the subject’s arm. The 
subject is seated so that the shoulder of the arm he uses is at the 
center. Timers control the intervals between the appearance of 
successive lights, and the duration of the on times for each light 
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When a light goes on, the subject is instructed to put it out by 
hitting the knob The recording is automatic Using this equipment 
Komblum has been able to find definite drug effects 
Tracking Apparatus Tins is a standard task which has been 
studied extensively for itself alone, but has scarcely been used in 
evaluating stress effects, although it appears to us to have great 
promise in this field The subject’s task is to keep a blip on an 
oscilloscope, the motion of winch is controlled by an electronic 
problem generator, directly on cross-hair lines He does this by 
moving a joy-stick The bhp can move at different rates and with 
differing regularity or randomness The subject's error m the track- 
ing task can be analyzed into factors due to a) lag in response, b) 
misperception of spatial distance, c) misperception of speed of mo- 
tion, and d) misperception of acceleration of motion Each of these 
error components can be automatically recorded m digital fashion 
It is quite possible that some drugs or other stresses will affect 
certain of these performance factors, while others will affect other 
factors, or other functional subsystems 
These are a few of the types of apparatus with which we are now 
experimenting It is important, of course, to develop norms for per- 
formance on them and to learn all we can about the variables they 
measure Many may correlate highly with variables in pencil and 
paper tests or on rating scales But we believe that, as there are ad- 
vances in the use of such instruments, they may provide percision 
of personality measurement beyond that which at the moment 
exists 
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Summary and Conclusions 

Habold B Pepinsky 
The Ohio State University 


X FRIEND RECENTLY COMMENTED 

that psychologists do not listen to other people, because psychol- 
ogists prefer to tell about what they are doing My friend, who is 
himself not one of the faithful, had just returned from a conference 
attended mainly by psychologists, and this was the impression they 
had given him If one examines the reference lists appended to 
the foregomg papers, however, one must infer that this group o 
psychologists, at least, is well aware of past and current literature 
on the assessment of human personahty Yet with the exception o 
Robert Watson, our historian, the symposium participants have run 
true to my friend s impression In a very human way, each has made 
reference to the work of others as a means of justifying w ia le 
up to A brief review of what has been said and some concluding 
remarks are now m order 


A Review 

Robert Watson has traced for us the growing concern of Amen 
can psychologists for objectivity in die measureme - SDOnscs 
Of (a) stimulus conditions, (b) the observer, 3 “|K Vats0n fugues 
of the person bemg observed At the same tun , awareness 
that American psychologists have increased bo 1 1 jn jj Jcir 

of the complexity of measuring human pe rs ° n me ^ u rcmcnt Ho 
willingness to tackle comphcated problems f "objcc- 

brings 5 his chapter up to dite by reporting on definitions J 
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tive personality assessment,” with which he has been furnished by 
the other symposium participants 

Performance, projection, and self description approaches are criti- 
cally reviewed by Donald Super, who comments also, upon the 
waxing and waning of fads in assessment theory and methods He 
concludes that projection (projective) approaches have little prac- 
tical utility at either the public or private levels of measurement 
While Super does not reject the use of performance measures, he 
cites the higher predictive validity of self description measures, 
particularly of the biographical (1 e , autobiographical) inventory. 

The next participant, Raymond Cattell, does not like scaling 
methods of assessing personality, preferring instead factor analytic 
method applied to what he calls “multivariate experiment” in the 
natural setting Cattell is armed with an impressive array of rating, 
questionnaire, and objective test data, which have been digested m 
large amounts by electronic computers He argues that multivariate 
factor analysis is analogous to the clinical method of personality 
assessment, factor analysis enables the psychologist to remain close 
to his data as he interprets them and yields information on both 
known and unknown clinical constructs 


Louis McQuitty, however, thinks that lus method of “successive 
agreement analysis,' used to isolate and differentiate personality 
types, is more flexible than traditional factor analysis This is be- 
cause the latter method requires items of invariant validity for all 
persons sampled, whereas the former assumes items to have differ- 
ential validity for different groupings of persons McQuitty’s 
method appears to be a kind of Q technique, in which persons are 
intercorTelated across a set of items and then, by iterative procedure, 
grouped mto successively larger clusters of like responding persons 
Irwin Berg seriously questions whether item particular content is 
important in differentiating among “operationally clean” criterion 
groups of persons and supports tins statement by reference to his 
own and others research. In his view, it is the characteristic re- 
sponse pattern, or set, of such a group of persons— measured across 
widely different lands of stimulus conditions-that distinguishes the 
group rom other groups (or from people in general) Tins assump- 
tion underlies Bergs “deviation hypothesis,” winch states that a 
group that differs from other groups in what Berg has designated 
ha^““ aiCaS ° f behavl0r " wlU differ also in its no, lent, cal be- 


Such group differences are shown to occur, moreover, m the 
amoun of favorablcncss assigned to personality test items And 
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tins empirical finding is used by Allen Edwards in developing “so- 
cial desirability” scales His studies of social desirability exemplify 
what he refers to as the “construct method” of personality test con- 
struction, which he clearly prefers to factor analytic or “criterion 
group” methods The idea here is that individuals and groups differ 
not only in their tendencies to select “True,” “False,” or “Undecided” 


responses to items, but in their tendencies to give socially desirable 
responses to items as well (regardless of whether the socially desir- 
able responses are “True” or “False”) Thus it is considered par- 
simonious to regard social desirability as being a major contributor 
to the variance of personality test items For example, people in 
general can be expected to regard clinically designated item 
clusters such as those characterizing “depression” or “schizophrenia 
as being socially undesirable, and this expectation is supported by 
the research of Edwards and his associates Also, persons who score 
high” on these socially undesirable item clusters can be expected 
to obtain “low” social desirability scores, independently measured, 
on other tests, which such persons tend to do 
James Miller not only sounds hke an electronics engineer in Ins 
paper, he makes clear to us that he wants to sound hke an elec- 
tronics engmeer in his approach to the problem of persona lty 
assessment Most of the physical and biological sciences, he points 
out, have common units of observation and measurement (the cen- 
timeter-gram second system) and, more recently, of enumeration 
(the bit of information) and a classical conceptual system In start 
contrast, psychological language, measurement, and conceptua za 
tion is a mess! His compelling argument is that the concepts an 
methods of general systems theory can be used to advance ie y 
of physical and behavioral science First, he suggests iow ai 
frequently used m the description of human persona 1 y 
more precisely redefined and measured if the human orga 
viewed as a component m an electronic system, secon » J® £ 

novel test situations that are being developed at e shows 

Michigan’s Mental Health Research Institute, and m ’ » , n 

how such tests can be used to provide precise meas commu - 
personality As the title of his paper ”e 

nicable and valid personality theory is in the maki g, F 
Drfisfini- r.l g .mc nprmns. it is not now a vaiia Die 


noitzman has taken as ins assessing , utl | lzcs more 

device, the Rorschach, and modified it into a , jhtf task, 

objective stimulus conditions and scoring proce 
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he has sought to combine the qualitative richness of this projective 
method with the quantitative ngor of psychometric analysis The 
result— the Holtzman Inkblot Test— is described following a brief, 
yet comprehensive review of methodological problems that arise in 
dealing with projective materials 
Bernard Bass has provided an exceptionally well organized and 
scholarly background for his discussion of the Louisiana State Uni- 
versity leadership studies, including an extensive review of pub- 
lished attempts to assess the leadership variable by objective 
methods The maturity of his paper, in an area that has been 
characterized by procedural sloppiness and sweeping gene raliza tion, 
is a testimonial to the value of a research worker’s dedication to a 
restricted methodology which, m this case, centers on the study of 
changes of attitudes or preference in groups Bass’s dedication has 
yielded an enormous supply of empirical data, which he has been 
able to organize into theoretical propositions that in turn have 
suggested meaningful and valid predictions Concurrently, he has 
eveloped methods for the rapid processing and analysis of his data, 
eg, they are fed directly into an electronic analog computer 
tiough the yield is artificial, unnatural, and limited”— to use his 
own terms, his work has generated a number of hypotheses that can 
be tested m natural settings, outside of the laboratory 

lo the clinician who is fearful that his services are about to be 
ispense with, William Hunt s paper on an “actuarial approach to 
J“, d °™ ent wJ1 hnng reassurance What Hunt urges, in 
'im c G P r °cess of inductive logic, which human organisms 

6 w an , d , whlch electronic computors-to date, at least- 
nersnmlit!, 2 tZ °^’ ca P‘J a lrzed upon in the assessment of human 

tests c-nAn P, roc “ s cluneal judgment, hie paper and pened 
L^ber nf b ^. S S nd f dlZetI ? P rrac, P Ie > aa d Hunt has shown m a 
mtenrLnL^ r ‘^ ns ’ f eferTed t0 ln hls P a P er > that clinicians’ 
can be hiohl S °1 responses to complex stunulus situations 

f ! e Hc Stresses however, that agreement among 

" , ,V ll sltaatlona demands tliat the judges be carefully 
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clerk or a terli^ >> 1 °* 8 Ivin g and scoring (can a high grade 

clerk or a technician do it?) illustrates the latter objective, mcrcas- 
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mg tlie number of “Juts” in predicting from test to criterion data 
illustrates the former Hathaway warns, however, against attempts 
to predict diagnoses or clinical estimations to the exclusion of what 
he considers a more important hind of validation, i e , that obtained 
m the process of construct development The construction of his 
own Psychopathic Deviate (Pd) scale, one of the measures yielded 
by his Minnesota Multiphasic Personality Inventory, is cited as a case 
m point Development of tins construct, for example, gave rise to 
the behef that diagnosed Pd s would be relatively deficient m their 
emotional functioning A subsequent experiment demonstrated that 
high Pd’s indeed were less susceptible than low Pd’s to autonomic 
conditioning 

Some Concluding Rem auks 

The symposium participants, as a group, are dissimilar in their 
conceptions of how personality is to be assessed Thus, for example, 
the advocated raw data of assessment range from expressed clinical 
judgments of patient behavior m the natural setting to subject re- 
sponses m a controlled laboratory situation that can be fed directly 
into an electronic computer On one point, at least, there is over- 
whelming agreement whatever assessment data are used must, in 
principle, be capable of being defined empirically, measured in 
amount, and recorded as scores m the pubhc domam It is this km 
of behef in “objective” measurement, so obviously a shared con- 
viction of the contributors, that characterizes the present sym- 


posium t , 

On the issue of objectivity versus subjectivity m the assessment 
of personality, then, there is no quarrel among the participan 
there are other issues on which the participants appear o e 
hearty disagreement, or where additional problems o re ^ ' “ 
can be expected This may seem strange, when one stops to 
that a response made by a subject to a defined strmulus 
simply, the product of an interaction between subject and stnmUns 
condition Yet several meU.odolog.c-d qmstmns j) 

An additional “tfeS^but' is lonethdess relevant (5) In whose 
upon by the paxUc.p^nts de fi„ed and m whose view is 

r«d"oft V response to he assessed? Our 
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Ohio State University research productivity in organizational set- 
tings has suggested the importance of maintaining conceptual dis- 
tinctions here among (a) the subject, (b) the task-setter who 
prescribes the stimulus condition, and (c) the observer who reports 
on and analyzes the subject’s response and who may or may not be 
identical with the subject or the task-setter Thus, Hunt’s clinical 
judges must be viewed at the same time both as observers of patients 
whose responses are to be predicted, and as themselves subjects 
whose clinical predictions are to be treated as responses A related 
methodological question also must be asked (6) What is the sub- 
jects phenomenal view (as a measured response) of the task-setter 
and the stimulus condition that he prescribes for the subject? It is 
important to know whether the subject being assessed, eg, in 
Bass s laboratory or in a Hathaway hospital setting, regards himself 
as playing a game as opposed to playing “for keeps,” whether the 
task setter is regarded as hostile, friendly, or neutral, or whether 
the task condition has face validity for the subject as something 
that is important for him to perform * 

In retrospect, one could wish that questions of this sort could 
have been dealt with explicitly by all of the participants in the 
symposium Their probable lack of consensus m answering such 
questions can be inferred, I think, from the participants’ lack of 
specific agreement m defining the term “objective,” despite their 
shared concern for measurement and for scores (see Chapter 1) 
ack ot consensus, or even of concern, can be inferred, too, from 
unresolved issues, to some of which we can turn now 

erg an Cattell, for instance, differ on the issue of item mter- 
pre a ion, erg implying that particular item content can safely be 
ignored and Cattell suggesting that subject item interactions can 
f erpreted to yield useful clinical constructs Again, Cattell 
► ef »’ C J* 0m ^ u P er » 111 respect to the manifest versus the 
mpTrnnir^ Gn »i° a ^_ ltem Super wants the item to have the same 
fo ^ t ie _ object as it has for Super, while Cattell does not 
Ti “ “^subject t0 be ? ble to g^ss what the item means to Cattell 
15 validity issue, with Super arguing for the utility 
vallditi ^ Edwards and Hathaway expound construct 

shnwftW a n, See ^ to e >’ e about iti purpose Edwards 
utahlp to C *, ab e . ltem res ponse variance, otherwise attnb- 

pec c clinical syndromes , can be accounted for more 

Tt * 1 h J lTr C ^ elL (See Cnswoll Joan. Tkc psy 
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parsimoniously m terms of Edwards’ social desirability component 
Hathaway, a practicing and practical clinician, argues for what 
works and has utility in the hospital setting, by which criteria the 
presence or absence of a social desirability loading may be irrelevant 
to the interpretation of an item 
How item-responses should be analyzed is a third issue, with 
Cattell preferring factor analysis, McQuitty supporting successive 
agreement analysis, and Edwards expounding scaling A fourth 
issue is that of the laboratory versus a natural setting as the locus 
of data collection, although even Cattell, the most vigorous exponent 
of “observation m situ" imposes behavioral restrictions on his sub 
jects in the process of obtaining information about them Hunts 
judges, of course, are quite free to observe patients’ natural move- 
ments in tile hospital setting, even here, though, it is assumed that 
the reliability of predictions among judges is increased by control 
of the stimuli to which they are exposed Bass, in contrast, is un- 
ashamedly at work m a laboratory, seeking lawful relationships 
among variables systematically controlled and manipulated 

Strangely enough, none of the authors has stopped to define the 
term, “personality ” Therefore, we cannot know whether tins is a 
definitional problem on which they are at issue Then chapters 
indicate, however, that the participants are otherwise unpresse 
with the need for definitional clarity and empirical referents lor 
their variables, and for obtaining reliable scores based on care i 
observation and measurement of the variables What emerges om 
this preoccupation is a land of stubborn, but highly commen a 
simplicity m the authors’ discussions of their work. e may c 
elude that the apparent simplicity results precise Y ^^verv 
empirical study on wlucli the chapters are based as ee , 

painstaking and extensive in eveiy case, each au m _ e 

knows what he is writing about On the one hand, tli P S 
encouragement to the blhef that we are closer than ever before to 
being the proud possessors of psychological laws variance 

hand, the implications of the several chapters are so o 

With each other that the claim of any of the participants re me 
possession of knowledge that orders psycho ' ogi 
lawful association with each other is soniewha w ^ 

Nearly all of the authors give clear inicahou offtor < ^ 
that human life is verv complicated and , » man 


cadence liacn takes •* — ; - 

evidences a belief in the validity and urgency 
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what emerges as a dominant theme of the symposium is a pre- 
occupation with method, instrumentation, and the interpretation 
of data. For example, the heuristic yield of new methods for the 
rapid collection and processing of data is taken for granted by 
several of the authors, who promote the use of electronic gadgetry 
with great enthusiasm. An implicit and seductive argument that 
underlies this zeal for a “psychonomy of abundance” is that if we 
can get enough data, we are bound to have good data. 

Curiously, also— in view of a Zeitgeist for interdisciplinary re- 
search (e.g., among research fund-granting agencies), Miller is the 
only member of the group who openly and strongly advocates an 
interdisciplinary research effort. None of the participants has paid 
more than passing lip service to the possible contributions that 
sociologists, anthropologists, or other kinds of social scientists might 
make to problems of assessing human personality. Is this because 
other social scientists have nothing to contribute to the symposium 
topic? One regrets that this question did not become an issue for 
the symposium. 

Despite a few understandable sins of commission and omission, 
there is much to read in the present series of papers that is rich 
and satisfying. A new and abundant harvest of ideas and facts and 
methods of inquiry has been made available for research workers 
and practitioners alike. The papers give heartening testimony to 
the fact that one can stay very close to one’s data yet, in making 
sense out of them, manifest considerable imaginativeness and pro- 
ductive originality. Certainly, by comparison with the OSS and VA 
assessment research of the previous decade, there is ample evidence 
furnished us here of significant accomplishment in the development 
of a methodology for the objective assessment of human personality. 
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